i 


b? 
UNT EA 


i 


i 
| 


EXPERIMENTAL 
DESIGN 

IN 
PSYCHOLOGICAL 
RESEARCH 


Also by ALLEN L. EDWARDS Statistical Analysis 
for Students in Psychology and Education 


iud 


l ee T 

EXPERIMENTAL 
DESIGN 

IN 
PSYCHOLOGICAL 
RESEARCH 


By ALLEN L. EDWARDS Professor of 
Psychology : University of Washington 


a. 
ets Wee Sy 
f g Library "on 
[e [^h 
DEC 
* a 
5 ceres & 
Do Pp e, 


: yo d 
2- 


B 
RINEHART & COMPANY, INC. 


PUBLISHERS : NEW YORK 


First Printing May, 1950 
Second Printing January, 1951 
Third Printing January, 19353 
Fourth Printing March, 1954 


SGureau'Edni. * *»' 
DAVID HARE TRAINING C fne 


TR EC: 2 59 
jme. OSL. sii 


S.C.ER ^. Wast Bengal 
Date 19.053, 55 uwa 


Acc. No.8.5. I. DI 


COPYRIGHT, 1950, BY ALLEN L. EDWARDS 
PRINTED IN THE UNITED STATES OF AMERICA 
DESIGNED BY STEFAN SALTER 
ALL RIGHTS RESERVED 


Preface 


S 


This book attempts to present to the student in psychology, 
education, sociology, and the behavioral sciences, some of the newer 
developments in statistical analysis, particularly with respect to 
small-sample theory, as they relate to problems of research and 
experimentation in these fields. I have assumed that the reader 
will have a working knowledge of algebra and that he will be 
familiar with the elementary notions of statistical methods as 
taught in the usual introductory course in applied statistics. 
Nothing more in the way of previous mathematical training is 
required for an understanding of the material presented. 

A number of hypothetical sets of data are interspersed with 
the results of actual experiments. I have attempted to arrange 
the examples at the end of each chapter so that some illustrate 
by fairly easy computations the kinds of analysis deseribed in the 
text. Some problems requiring more prolonged calculations have 
also been included. These examples are in all cases modeled after 
the methods presented in the chapters. I have also included in 
the examples at various times, a brief discussion of a particular 
Point which the example has been designed to illustrate. I trust 
hese discussions will not be overlooked by the casual reader. 
Answers to all examples are given in the appendix. —— 

It will be obvious to the careful reader, but it 1s à pleasure 


to acknow ^ it here also, that Iowe much to the various publi- 
dé ae e Wishart, M. S. Bartlett, 


cations of R. A. Fisher, Frank Yates, John : 
G. W. Snedecor, W. G- Cochran, and others. I have tried to 
acknowledge these sources at the appropriate places throughout 
the text. 

I should like to say 
Company, made it possi 
Horst, to serve as à technical 
Horst read all the chapters an 


also that my publishers, Rinehart and 
ble for my colleague, Professor Paul 
consultant on the manuscript. Dr. 
d his criticisms and suggestions have 


vi PREFACE 


resulted in a clarification of many important points. I am most 
grateful for his assistance. 

I am indebted to Professor Ronald A. Fisher and to Messrs. 
Oliver & Boyd Ltd., Edinburgh, for permission to reprint Tables 
IV, V, and VI from their book Statistical Methods for Research 
Workers. Table I is reproduced by permission of Messrs. Kendall 
and Smith and the Royal Statistieal Society. Table VIII has 
been reproduced from Professor Snedecor's book, Statistical 
Methods, by permission of the author and his publisher, the Iowa 
State College Press. Additional values of ¿ at the 1 and 5 per 
cent levels were also taken from Professor Snedecor’s book by 
permission. Portions of Table II have been taken from Hand- 
book of Statistical Nomographs, Tables, and Formulas by permission ~ 
of Drs. J. W. Dunlap and A. K. Kurtz and their publishers, the 
World Book Company. 

To the American Psychological Association, the American ~ 
Statistical Association, the Biometrics Society, the Royal Statisti- 
cal Society, the British Psychological Society, Johns Hopkins 
University, the University College of London, Williams and 
Wilkins Company, and Warwick and York, Inc., I am indebted 
for permission to quote material and to make use of data published 
in various professional journals. 

Finally, I should like to acknowledge a very special debt of 
gratitude to Barbara Jacobsen, my assistant, for the many hours 


she spent in checking my computations. 
A.L.E. 


Seattle, Washington 
February, 1950 


CONTENTS 


/ PREFACE v 

CHAPTER 1. THE NATURE OF PSYCHOLOGICAL RESEARCH 

| 1. Introduction 1 

2. Observations 1 

3. Stimulus variables 3 

4. Response variables 4 

5. Organismie variables y 

6. Research problems 9 

7. Dependent and independent variables 13 

Í 8. Examples 13 
| CHAPTER 2. PRINCIPLES OF EXPERIMENTAL DESIGN 

1. Samples in research 15 

| 2. Sampling distributions 17 

| 3. Randomization and experimental design 20 

4. Tables of random numbers 22 

5. The difference between 2 means 23 

6. The test of hypotheses 27 

7. Two kinds of errors 29 

8. Practical versus statistical significance 30 

| 9. Examples l 31 

CHAPTER 3. PROBABILITY AND EXPERIMENTAL DESIGN 

1. The farmer's divining rod 33 

2. A simple experimental design 34 

3. Permutations and combinations 35 

4. Experimental controls 37 

5. A limitation in the design 39 

6. Increasing the sensitivity of the experiment 40 

7. The binomial expansion 43 

8. An experiment on taste 45 

9 48 


. Examples 


vii 


viii 


CONTENTS 


CHAPTER 4. THE NORMAL AND 


THE BINOMIAL PROBABILITIES 


x? APPROXIMATIONS OF 


1. Introduction 51 

2. The normal distribution 83 
3. Relation of the binomial to the normal distribu- 

tion 56 

4. Parameters of the binomial distribution 57 
5. Approximations of the binomial probabilities from 

the table of the normal curve 58 

6. Evaluation of the experiment on taste 60 

7. A problem in opinion polling 61 

8. The x? distribution 63 
9. The relation between z and x? for 1 degree of free- 

dom 66 

10. Summary 67 

11. Examples 68 

CHAPTER 5. EXPERIMENTS INVOLVING A COMPARISON OF 

THE DIFFERENCE BETWEEN 2 FREQUENCIES OR 

PROPORTIONS 

1. Introduction 73 

2. The null hypothesis 75 
3. Standard error of the difference between 2 uncor- 

related proportions 76 

4. ur rsa and two-tailed tests of significance T7 

5. The x? test for the difference between uncorrelated 

proportions 80 

6. The correction for continuity 82 

7. Another method for calculating x? 85 

8. Correlated proportions 86 
9. Standard error of the difference between correlated 

proportions 87 

10. The x? test for correlated proportions 89 

11. Correcting for continuity: correlated proportions 90 
12. The influence of correlation on the test of signifi- 

cance gl 

13. Examples 91 


CONTENTS 


ix 


CHAPTER 6. THE APPLICATION OF THE x? DISTRIBUTION TO 
RESEARCH PROBLEMS INVOLVING MORE THAN 


womaow1rwnre 


1 DEGREE OF FREEDOM 


120 


121 


122 


. Introduction 

. A study of preferences 

. A study of industrial accidents 

. A study of vocational advisement 

. The j X 2 table 

. Planning the comparisons to be made 

. An experiment involving a test of technique 

. x? for more than 30 degrees of freedom 

. Examples 

CHAPTER 7. TESTING HYPOTHESES ABOUT CORRELATION 
COEFFICIENTS 

. Introduction 

. The sampling distribution of the correlation coeffi- 
cient 

. The normal eurve test of the hypothesis of zero cor- 
relation 


. The t test of the hypothesis of zero correlation 

. Table of significant values of the correlation coeffi- 
cient, 

. The z’ transformation for the correlation coefficient 

. Establishing the fiducial limits 

. Testing the significance of the difference between 
2 correlation coefficients 

. Finding an average value of the correlation coeffi- 
cient for 2 samples 


. An average value of the correlation coefficient based 


upon several samples 


. Testing the hypothesis that several samples are 


from a common population 


. The fiducial limits for an average value of the cor- 


relation coefficient 


. A correction for a systematic bias in averaging z' 


values 


. Examples 


123 
125 
126 
128 
131 
132 
133 
135 
136 


136 
188 


x CONTENTS 


CHAPTER 8. THE } TEST AND THE SIGNIFICANCE OF MEANS 
AND DIFFERENCES BETWEEN MEANS 


1. The sampling distribution of the mean 142 
2. The ¢ distribution 143 
3. The fiducial limits for the mean 144 
4. Increasing the size of the sample 147 
5. Fiducial probability 147 

148 


6. An experiment on retention 
7. Thestandarderror of the difference between 2 means 150 
8. Testing the null hypothesis 151 
9. The fiducial limits forthedifference between 2means 152 
10. Estimating the size of the sample for a repetition of 


the experiment 153 

11. The influence of changes in C? on the subsequent 
value of t 156 
157 


12. Examples 
CHAPTER 9. HETEROGENEITY OF VARIANCE AND THE t 


TEST 

1. Introduction 162 
2. The F distribution 163 
8. Testing for homogeneity of variance 164 
4. The effects of nonnormality 165 

5. Testing the hypothesis of a common population 
mean when n; and n» differ 167 

6. Obtaining the value of ¢ which will be regarded as 
significant 168 

7. The influence of heterogeneity of variance upon the 
t test 169 

8. Testing the hypothesis of a common population 
mean when 7; equals na 170 
9. Examples 171 
CHAPTER 10. AN INTRODUCTION TO THE ANALYSIS OF 

VARIANCE 

1. Introduction 174 


2. The partitioning of the total sum of squares for r 
random samples of n cases 176 


CONTENTS xi 
3. The mean square within groups 177 
i 4. The mean square between groups 179 
- 5. Independence of the mean squares 180 
6. The test of significance 181 
7. An analysis of variance for 2 groups 182 
8. Some problems to which the analysis of variance 
might be applied 185 
9. An experiment involving 5 experimental conditions 186 
10. Summary of the calculations 189 
11. Examples 191 
CHAPTER 11. HETEROGENEITY OF VARIANCE AND TRANS- 
FORMATIONS OF THE SCALE 
1. Introduction 195 
r 2. The test for homogeneity of variance: equal n's 195 
3. The test for homogeneity of variance: unequal ?'s 197 
4. Transformations 198 
5. The square root transformation 199 
6. Two examples of the application of the square root 
transformation 200 
7. The logarithmic and angular transformations 202 
8. The reciprocal transformation 203 
9. The analysis of variance without transformation 203 
10. Examples 204 


CHAPTER 12. THE 2" 


-= Y = = ee 
ne we 


— onm 
eoo-1oc 


FACTORIAL DESIGN FOR EXPERI- 


MENTS IN WHICH VARIABLES ARE VARIED 


IN ONLY 2 WAYS 


. Introduction 

. The 2 X 2 factorial design 

. Testing for homogeneity of variance 

. Partitioning the sum of squares between groups 

. Calculation of the interaction sum of squares 


based on 1 degree of freedom 


. Allocation of the degrees of freedom 

. Interpretation of the experiment 

. The 2 X 2 X 2 factorial design 

. Sums of squares for the main experimental vari- 


ables 


208 
208 


CONTENTS 


xii 
D ——— o 


10. The interaction sums of squares 228 
11. The interpretation of the experiment 225 
12. Testing whether one mean square is significantly 
smaller than another 228 
13. A method of showing the comparisons in the 2" fac- 
torial design 229 
14. Orthogonal comparisons 232 
15. Some advantages of the factorial design 232 
16. Examples 233 


CHAPTER 13. COMPLEX FACTORIAL DESIGNS 


1. Introduction 237 

2. A 4 X 3 X 2 factorial design 237 

3. Caleulation of the sum of squares 238 
4. Direct caleulation of a second- or higher-order in- 

teraction 242 

5. Summary of the analysis 246 
6. The use of interactions as error terms instead of the 

usual estimates of error 247 

7. Factorial designs without replication 252 
8. The assumptions involved in pooling higher-order 

interactions 254 

9. An example of an analysis without replication 256 
10. Testing the interaction mean squares for homo- 

geneity of variance 258 

11. Summary 259 

12. Examples 260 


CHAPTER 14. EXPERIMENTAL DESIGNS INVOLVING 
MATCHED GROUPS 


1. An experiment described 264 
2. A change in the design 264 
3. Analysis of the data 266 
4. The nature of the residual sum of squares 269 
5. The efficiency of designs involving matched groups 270 
6. The matching variables 273 
7. An analysis of several matched groups 274 


CONTENTS xiii 


8. The ! test applied to 2 matched groups 276 
i 9. The importance of considering possible interactions 278 
g" 10. Examples 280 


CHAPTER 15. EXPERIMENTAL DESIGNS INVOLVING RE- 
PEATED MEASUREMENTS OF THE SAME SUB- 


JECTS 

1. A problem in experimental design 284 

2. Testing the significance of practice effects for a 
single group given 2 trials 286 

3. The significance of practice effects for a series of 
trials 287 

| 4. The analysis of repeated measurements on several 
Ww independent groups 288 
| 5. Calculation of the sums of squares 289 
| 6. The tests of significance 294 
| 7. Relating the analysis to previous methods 296 
| 8. Examples ` 297 
| CHAPTER 16. THE LATIN SQUARE DESIGN IN PSYCHOLOGI- 

| CAL RESEARCH 

1. An experiment in color recognition 303 
! 2. The Bliss and Rose experiment 306 

| 3. A method for assigning treatments in the Latin 
square 306 
| 4. The analysis of variance for the Latin square 310 
! 5. The direct calculation of the residual sum ofsquares 312 
6. Applications of the Latin square design 315 

7. Combining the factorial design with the Latin 
square 317 
8. The nature of the row mean square 318 

9. An experimental design with replication of the 
same Latin square 319 
10. Analysis of the independent observations 320 
11. Analysis of the correlated observations 323 
12. Summary of the analysis 326 
327 


13. Examples 


xiv CONTENTS 
CHAPTER 17. APPLICATIONS OF THE ANALYSIS OF COVAR- 
IANCE 
1. Introduction 333 
2. The analysis of covariance 335 
3. Correlation and regression 336 
4. Partitioning the total sum of cross products 338 
5. The sums of squares of errors of estimate 340 
6. An application of the analysis of covariance 341 
7. Interpretation of the analysis 346 
8. Another application of covariance analysis 349 
9. Interpretation of the analysis 353 
10. The use of the analysis of covariance in research 355 
11. Examples 355 
BIBLIOGRAPHY 359 
LIST OF FORMULAS 369 
APPENDIX 377 
TABLE |. TABLE OF RANDOM NUMBERS 378 
TABLE Il. SQUARES, SQUARE ROOTS, AND RECIPRO- 
CALS OF NUMBERS FROM 1 TO 1,000 383 
TABLE Ill. AREAS AND ORDINATES OF THE NORMAL 
CURVE IN TERMS OF x/o 396 
TABLE IV. TABLE OF x? 406 
TABLE V. TABLE OF f 407 
TABLE VI. VALUES OF r AT THE 5 AND 1 PER CENT 
LEVELS OF SIGNIFICANCE 408 
TABLE VII. TABLE OF z’ VALUES FOR r 409 
TABLE Vill. THE 5 AND 1 PER CENT POINTS FOR 
THE DISTRIBUTION OF F 410 
ANSWERS TO EXAMPLES 415 
INDEX OF NAMES 435 


INDEX OF SUBJECTS 437 


oy 


EXPERIMENTAL 
DESIGN 

IN 
PSYCHOLOGICAL 
RESEARCH 


^ 


^ 


CHAPTER 1 


The Nature of Psychological Research 


1. INTRODUCTION 


The results of experiments are ordinarily reported in terms 
of frequencies, proportions, means, variances, correlation coef- 
ficients, standard errors, and other statistical measures. On the 
basis of these values, computed from the observations made during 
the course of the experiment, we wish to draw certain conclusions 
or inferences. But the validity of such inferences depends upon 
certain conditions which must go into the design of the experiment. 
We cannot introduce these conditions after the data have been 
collected. They must be given consideration in the planning of 
the experiment. In this book we shall be concerned with the 
planning of experiments and with the analysis of experimental 
data, primarily in the fields of psychology, sociology, and education. 


2. OBSERVATIONS 

Psychological research is concerned with three kinds of vari- 
able: stimulus variables, response variables, and organismic vari- 
ables. By observation and experimentation, the psychologist 
attempts to describe and, when possible, to quantify or meas- 
ure these variables, and to study the relationships between 
them. 

A quantitative series of observations may be obtained by 
measurement; for example, when the stimulus under observation 
consists of a beam of light and the intensity of the light is varied 
and measured. The differences in the intensity of the light are 
matters of degree and these may be measured, giving rise to a 
series of quantitative observations. This series is also continuous 
in that we could theoretically increase or decrease the intensity of 
the light by infinitesimal amounts. It is also true that no matter 

1 


2 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


how fine we might make the differences in intensity, they could 
always theoretically be made finer. That they are not may simply 
be a function of our measuring instrument, or it may be due to the 
fact that we had no need for a more precise observation. Conse- 
quently, we may note that observations of a quantitative and con- 
tinuous variable are approrimate and not exact. 

If we were interested in a problem involving a light stimulus, 
we might keep the intensity of the light constant and vary the 
number of times the light was presented. A series of such observa- 
tions would be quantitative but not continuous. The number of 
times that the light can be presented can increase or decrease only 
by integral numbers and not by small fractions. We may present 
the light 4 times or 5 times, but not 414 or 4o times. This series 
of observations would be quantitative and discrete. Quantitative 
data obtained, as they were here, by counting are often referred 
to as enumeration or frequency data. Such data differ from the 
continuous data obtained by measurement in that they are exact. 
No approximation is involved, for example, in counting the num- 
ber of times the stimulus was presented; if no error occurred in 
making the observations, the frequency of presentation is known 
exactly. 

The difference between a continuous and a discrete series 
of observations may be illustrated also in terms of a response vari- 
able. We might observe the number of times a particular response 
occurs to the light stimulus. For example, in a conditioning ex- 
periment, if the light stimulus had always been followed by an 
electric shock to the finger in a preliminary series of trials, we might 
count the number of times the finger is flexed to the light stimulus 
when it is presented alone during a series of critical trials. This 
result would give us a quantitative and discrete series of observa- 
tions of the response.. On the other hand, we might easily record 
our observations of the finger response by measuring the amplitude 
of the finger flexion and thus obtain a quantitative and continuous 
series of observations. 

Our interest might be in the ease with which we may establish 
conditioned finger flexion in the right hand as compared with a 
finger-flexion response in the left hand. The response variable 
which we observe would thus be of a qualitative kind. These two 


p^ 


—— 


THE NATURE OF PSYCHOLOGICAL RESEARCH 3 


qualitatively different responses, of course, might be recorded in a 
quantitative continuous or a quantitative discrete series. 

As we shall see, it is possible to quantify some kinds of 
qualitative variables, and, under certain conditions and with 
certain corrections, discrete variables may be treated as if con- 
tinuous. 


3. STIMULUS VARIABLES 


The stimulus variables in a psychological experiment may 
consist of relatively simple things, such as electric shock, light, 
sound, pressure, or temperature. These may be quantified by 
measuring the physical intensity of the stimulus. But there are 
other stimulus variables in which the psychologist is interested for 
which we have no measures corresponding to physical intensity. 
These may consist of problem-solving situations, motor conflict 
situations, social situations, and so forth, and these are relatively 
more difficult to quantify. We have not as yet devised techniques 
for measuring these stimulus variables comparable to our tech- 
niques for measuring the intensity of light or of electric shock. 

Approaches to quantification have been made, however. A 
series of problem-solving situations might be arranged in order of 
difficulty, and the successive integers from zero on might be as- 
signed to the problems on the basis of their ease of solution. To 
do this would require, of course, that we obtain ahead of time some 
indication of the difficulty of each of the problem-solving situations. 
This could be accomplished, in a somewhat crude and rough 
manner, by having a group of judges rate the problems. Better 
yet, we might actually try the problems out with a group of sub- 
jects and find the number or percentage capable of solving each. 
On the basis of the observed data the problems could be ranked or 
weighted as to level of difficulty. A quantification system such as 
this would mean that the difficulty levels of the problems constitute 
a discrete series, for the minimum increase or decrease in difficulty 
terms of successive integers. 

Suppose, however, that we add the frequencies and divide by 
the number of problems to obtain the average level of difficulty of 
the set. The average thus obtained may very well represent an 
alue in terms of the discrete series. For example, it 


could only be in 


impossible v 


4 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCII 


might be 8.4 or some other such figure. This figure may be made 
meaningful by considering the discrete series as being paralleled 
by an underlying continuum. Each discrete value of the initial 
series of observations would thus correspond to an interval on the 
underlying continuum. A frequency of 8, for example, would cor- 
respond to the interval 7.5 up to 8.5 on the underlying continuum, 
and a frequency of 9 would correspond to the interval 8.5 up to 9.5 
on the underlying continuum, and so on. For purposes of con- 
venience, many discrete series are referred to a continuum, as we 
shall see later. 

An approach similar to that described above might be made 
toward the quantification of other complex stimulus situations, 
and such approaches are often made in psychological research. In 
many cases, however, the stimulus variables in psychological 
research constitute a qualitative series. Thus, we have experi- 
ments concerned with differences in retention when material is 
presented orally and when it is presented visually. We investigate 
speed and accuracy of perception when figures are presented on 
variously colored backgrounds. We experiment with teaching 
methods by trying lectures versus discussions. We test the efficacy 
of various methods of psychotherapy. In each of these cases we 
have a qualitative stimulus variable. 


4. RESPONSE VARIABLES 


In observing response variables, we may have under investi- 
gation relatively simple responses such as finger flexions, knee 
jerks, pupillary responses, heart beats, respiratory changes, and so 
forth. At the other extreme, we have such complex response 
patterns as the motor behavior involved in typewriting, tennis 
playing, sending telegraphie code, or the even more complicated 
behavior involved in aggression, dominance, leadership, and social 
adaptability. 

We have already found that a simple stimulus variable is 
quantified more readily than one which is relatively complex, 
This is true with respect to response variables also. Thus, we 
have elaborate apparatus for recording and quantifying in a con. 
tinuous series a relatively simple response such as the dilation of 


e 


A 
ae 


THE NATURE OF PSYCHOLOGICAL RESEARCH 5 


the pupil of the eye or the flexion of a finger. As responses be- 
come more complex, we may still have quantitative observations 
but they are discrete rather than continuous. In measuring 
typewriting skill, for example, we may record the number of words 
typed per unit of time. In studying stylus maze learning of the 
human subject, we count the number of errors made per trial. We 
count the number of aggressive responses made by a child in a 
social situation or we count the number of times the child with- 
draws in response to the advances of another child. We have a 
quantitative series of observations, but they are discrete and not 
continuous. 

We attempt, however, to obtain a continuous series of 
observations of the more complicated response patterns by a 
variety of techniques. We devise rating scales and ask judges 
to rate on a continuum the degree of aggressiveness or submissive- 
ness exhibited by a child in a given situation. Although these 
ratings may be made on a discrete scale, we often assume that they 
represent a continuous series. Thus, if the rating scale contains 
but five discrete categories, ranging from 0 to 4, with 0 representing 
a minimum degree of aggressiveness and4 representing a maximum, 
and a given child obtains a rating of 3 on the scale, we may take 
this rating to represent an interval. We do not treat the rating 
as an exact figure of 3, for example, but instead assume that the 
rating may represent an interval ranging from 2.5 up to 3.5, or, 
in other words, the rating of 3 is taken as an approximate measure- 
ment rather than as an exact, discrete value. We do this because 
we grant the logie of recognizing that aggressiveness does not 
increase or decrease by successive or discrete values, but rather 
that human beings may be thought of as falling on a continuum 
separated by degrees of aggressiveness. That we are not able to 


mine an introductory text in general psychology to 
e have devised for measuring relatively simple re- 
ivanometers for measuring changes in skin 
asuring strength af hand ee pneu- 

asuring changes in respiration , plethysmographs for measur- 
morpha nation or vasoconstriction of a part of the body; sphygmomanometers 
for measuring changes in blood pressure, electrocardiographs for measuring 


changes in heart contraction, and so on. 


1 One need only exa 
discover the instruments w 
sponses. We have psychog: 
resistance, dynamometers for me: 


6 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


locate subjects more precisely on this continuum, but only in terms 
of an interval ranging .5 of a unit below and .5 of a unit above the 
recorded values, is a result of our technique of observation. The 
apparent discreteness of our observations is not believed to be an 
inherent characteristic of the variable observed. 

In the same way we often treat scores on psychological tests 
as representing approximate measurements rather than exact, 
discrete values. We may count the number of items responded 
to in a particular way on a test and, although these counts are 
discrete, we again recognize that the discreteness is an artifact of 

our method of observation. We might just as well have assigned 
fractional values to the various items in the test and thus have 
obtained scores which were separated by smaller values than we 
obtained by assigning values of unity to each item responded ty 
in a particular manner. The method of assigning weights to items 
in a test is often an arbitrary matter, and hence, though it may 
be more convenient to assign simple weights of unity, we treat 
the sum of these weights, the score, as belonging to a continuous 
series. A score of 18, for example, is treated as if it represented 
an interval ranging from 17.5 up to 18.5 and thus represented an 
approximate value in a continuous series of measurements rather 
than a precise, exact value of 18. . . 
Responses to an item in a test might be in terms of whether 
the subject strongly agrees, agrees, is undecided, disagrees, or 
strongly disagrees with the item. How do we quantity en 
responses? Again it is an arbitrary matter, but for E 
we often assign weights involving the successive integers rom 0 
to 4 to such item responses. Although these responses constitute 
a qualitative series, we arbitrarily quantify them by assigning 
some such weights. We may also reason that agreement" or 
"disagreement" is a matter of degree and that our observations 
have been forced into the discrete categories by the nature of the 
testing instrument. Thus, if a response is given a weight of 3, this 
may also be taken to represent an interval ranging from 2.5 up to 
3.5. Inalike manner we treat the sum of the item responses, the 
score on the test, as representing an interval ranging from .5 of a 
unit below to .5 of a unit above the recorded score. The value of 
procedures such as this is testified to by relating the scores obtained 


THE NATURE OF PSYCHOLOGICAL RESEARCH ri 


upon such tests to other variables. For example, we might find 
that a series of 10 such items, when arbitrarily weighted in the 
manner described, yields a series of scores for a group of salesmen 
that relate very well to actual success as a salesman as measured 
in terms of yearly sales records. 


5. ORGANISMIC VARIABLES 


Organismic variables arise from the ways in which organisms 
may be classified and from the observations and measurements of 
physical, physiological, and psychological characteristics of organ- 
isms. For example, we may measure the heights or weights of a 
group of individuals, and the resulting measurements would con- 
stitute a quantitative and continuous series of observations. 
These observations are not response variables, nor are they stimu- 
lus variables; but they may be conveniently described as organ- 
ismic variables. They are characteristic ways in which the 
particular group of organisms under observation vary. Similarly, 
organisms may be classified as to the color of their hair or eyes, 
and these classifications would constitute a series of qualitative 
Observations. 

We may also classify individuals in terms of their educational 
levels. Some will have no schooling, some will be grammar-school 
graduates, some high-school graduates, and some college graduates. 
This series of observations might be arbitrarily quantified by 
assigning the successive integers from 0 on to the various levels of 
education. Or we might simply record the number of years of 
schooling and assign these quantitative values to the observa- 
tions. 

Individuals may also be classified according to their sex, and 
this gives us a qualitative series. But this qualitative series is 
often, for reasons of convenience, quantified by assigning a value 
of 0 to one sex and a value of 1 to the other. This quantification 
makes possible many valuable statistical procedures that would 
otherwise not be possible. For example, by making such a 
quantification we have developed techniques for measuring the 
degree to which sex classification is related to some other variable 
such as performance on a test. Or again, employees may be 
classified as satisfactory and unsatisfactory in a given job. By 


8 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


arbitrarily assigning a value of 0 to one classification and a value 
of 1 to the other, we are able to measure the degree to which this 
classification is related to job performance as measured, let us say, 
by the number of units of work produced during a given period of 
time. 

Rats in an experiment on learning may be characterized as 
hungry or thirsty, and we have here a qualitative series of obser- 
vations or characteristies of the rats, for this is a difference in kind 
not in degree. But we may quantify within these qualitative 
differences by designating some rats as more hungry than others 
and some rats as more thirsty than others. Such designations 
would probably be based, of course, upon prior knowledge that 
some of the rats had not been fed or given water in 24 hr., whereas 
others had been fed or permitted to drink water as recently as 
12 hr. before the experiment. With this knowledge we might 
arbitrarily assign weights of 0 and 1 to the 12- and 24-hr. periods 
of deprivation, respectively. Or we might use the time intervals 
12 and 24 as our quantitative measures. Which we use is not 
important to the discussion at this point. We are merely inter- 
ested in illustrating the possibilities of quantification. 

Let us examine some additional organismie variables. 
Organisms are often classified on the basis of some prior obser- 
vations of performance or response. Such differences or classi- 
fications may often be regarded for convenience as organismic 
variables or as characteristics of the organisms whose behavior in 
some other respect is under investigation. Thus, upon the basis 
of performance upon psychological tests, we sometimes classify 
organisms as bright, average, and dull, or perhaps into even finer 
gradations. In response to an item in a public-opinion poll, 
subjects may be classified as those who are in favor of the issue, 
those who are against the issue, and those who are undecided on 
the issue. Or we may classify students in an investigation as 
those who have had a course in caleulus and those who have not. 

In studying behavior characteristics of young children we 
may classify them into two groups: those who were breast-fed 
and those who were'bottle-fed. In studying the response of adults 
to some social issue, we may find it convenient to classify them into 
those who voted for Roosevelt in 1944 and those who did not. In 


THE NATURE OF PSYCHOLOGICAL RESEARCH 9 


other cases we may classify individuals as to the size of the town 
in which they were brought up, or in terms of the size of the high 
School from which they graduated, or in terms of their nationality 
backgrounds. 

As with the response and stimulus variables, some of these 
organismie variables lend themselves more easily to quantification 
than others. We have, in our brief discussion, merely pointed to 
some of the possibilities. 


6. RESEARCH PROBLEMS 


We have stated previously that it is the concern of the psy- 
chologist and other scientists who are interested in the behavior 
of organisms to describe and study the relationships between the 
variables enumerated. Much of psychological research is con- 
cerned with attempts to improve our methods of description of 
the stimulus, response, and organismic variables by devising 
elaborate apparatus and techniques for more precise measurement. 
In the hands of other psychologists these devices, or, in their 
absence, relatively crude observations, become but part of the 
general problem of prediction. It is probably a valid assumption 
that as our measuring devices of all three variables improve, our 
predictions will also improve. 

Though it is often said that the main interest of the psy- 
chologist is in the prediction of response or behavior, this does not 
by any means exhaust the possibilities. Keeping in mind the 
three kinds of variables discussed, we may diagrammatically show 
the major problems of prediction in which psychologists are 
interested. We shall refer to these problems as cases. 

CASE 1: O0, — O,. Case 1 problems are concerned with the 
relationship between two organismic variables. : We may vary height, 
for example, and observe whether these variations are associated 
with variations in weight. Or we may vary the sex of the organ- 
isms and observe whether this classification is associated with 
changes in height. 

CASE 2: R,——o Ry. Case 2 problems are concerned with the 
relationship between two response variables. An example would be 
observing the response of organisms to a test and then determining 
whether these responses are related to response (performance) on 


10 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


a job. Another common illustration is studying the relationship 
between responses of subjects to two different tests. 

CASE 3: O. — R, If we were to classify a group of 
children into those who were breast-fed and those who were bottle- 
fed and then attempt to determine whether the bottle-fed children 
were more aggressive than breast-fed children, we would be con- 
cerned with a Case 3 problem. In another experiment, rats might 
be classified as hungry and not-hungry and maze performance 
might be studied to determine whether there was any difference in 
the response of the two groups. Subjects in an opinion poll may 
be classified according to educational level and differences in the 
response of the various levels to an opinion issue may be investi- 
gated. Case 3 problems, it may be noted, are concerned with the 
relationship between an organismic variable and a response variable. 

CASE 4: Rz—>O,. Case 4 problems are just the reverse 
of Case 3 problems. Here we study the response differences between 
organisms and attempt to relate these to an organismic variable. Can 
we predict from response to a test whether an individual will be 
classified as “normal” or “neurotic”? Can we predict from 
response to an opinion issue whether the individual is to be classi- 
fied as a Democrat, Socialist, or Republican? 

CASE 5: R,——5 Sy. Case 5 problems are not frequently 
encountered in psychological research, but an example is the study 
by Sherman (1928). Infants were subjected to a number of 
different stimuli, and their responses recorded by means of motion 
pictures. It was believed that each stimulus gave rise to a par- 
ticular emotional reaction. It was found, however, that in obsery- 
ing the reactions alone of the infants, judges were unable to predict 
successfully the nature of the stimulation. In Case 5 problems, 
the investigation is concerned, as was Sherman’s study, with attempts 
to prediet the nature of the stimulus from observations of the response, 

CASE 6: S,—— R,. Case 6 problems are among those 
most frequently found in psychological research. In these problems 
the relationship between a stimulus variable and a response variable 
is studied. We vary the intensity of a light stimulus and observe 
the speed of reaction. Or we vary the stimulus qualitatively by 


? References are cited by author and by date and may be found in the 
list beginning on p. 359. 


^s 


J—— u MÀS 


— = 


> 


THE NATURE OF PSYCHOLOGICAL RESEARCH 11 


presenting a visual and an auditory signal and study speed of 
reaction. The size of type in which words are printed may be 
varied, and we then determine whether there are differences in 
response (perhaps recognition) to the different type sizes. 

CASE 7: S,—— O, —> R.. Case 7 problems are some- 
what more complicated than the other problems discussed in that 
we altempt lo predict a response variable from a stimulus variable 
operating in conjunction with an organismic variable. For example, 
we might be concerned with reaction time (response variable) to 
a visual and an auditory signal (stimulus variable) of subjects who 
have been deprived of food for a given period of time and subjects 
Who have not been deprived of food (organismie variable). Or 
in a methods experiment in education, we may vary the method of 
instruction (stimulus variable) in several ways. With each method 
of instruction we might use as subjects children of varying levels 
of ability (organismie variable). Achievement (response variable) 
of these subjects under the differing stimulus conditions might be 
investigated. 

Now the classification of research problems in terms of the 
7 cases described above is simply a useful and convenient way of 
viewing the kinds of problems with which the psychologist is 
concerned. It may seem that certain complicated research 
problems fail to fall under any of the cases. Let us suppose, for 
example, that an experiment is designed in which several stimulus 
variables are involved. We are interested in the recognition 
(response variable) of words exposed under a variety of conditions. 
One stimulus variable might be the size of type in which the 
Words are printed and this is varied in 2 ways by printing the 
Words in 6-pt. type and 12-pt. type. Type size, then, is a quanti- 
tative stimulus variable. But we might also be interested in 
another stimulus variable: Are words more easily recognized when 
Printed in black type on a white background or when printed in 
White type on a black background? Here we have a qualitative 
Stimulus variable. Still another stimulus variable might be of 
interest, the exposure time. Will subjects recognize words more 
easily when the words are exposed for 120 m. sec. than when the 


MEN» 
3 For a different emphasis, see Spence (1944, 1948). 


12 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


words are exposed for 60 m. sec.? This is a quantitative stimulus 
variable. With respect to these stimulus conditions, we may raise 
the question as to whether differences will be present in the re- 
actions of men and women. The sex classification of the subjects 
in the experiment constitutes the organismie variable. How is 
this particular problem to be investigated and classified in terms 
of the 7 cases described? 

The investigation of the interrelationships between the 
variables described might be undertaken in a number of different 
ways. The experimenter might break his problem up into a 
number of separate experiments. Background, time of exposure, 
and sex might be held constant in one of the experiments and only 
type size varied. For this particular study, then, we would have 
& Case 6 problem. In another phase of the investigation, type 
size, time of exposure, and sex might be held constant and back- 
ground varied. This is obviously another Case 6 problem. 
Similarly, the stimulus variables might be held constant, and the 
responses of a group of men and a group of women investigated, 
giving a Case 3 problem. If we varied type size and held the 
other stimulus variables constant and studied the reactions of men 
and the reactions of women separately, we would have a Case 7 
problem. 

The same problem might be investigated in terms of factorial 
design, i.e., we might study all possible combinations of the stimulus 
and organismie variables in the same experiment. Type size 
would be varied in 2 ways, and each size could be accompanied 
by each of the 2 backgrounds giving 4 experimental conditions. 
With time of exposure varied in 2 ways also, we would have 8 
experimental conditions. Combining these with an organismie 
variable which is varied in 2 ways, we have a total of 16 experi- 
mental conditions. One such condition would be 6-pt. type, 
printed in black on a white background, with exposure time of 
60 m. sec., and women stibjects. The 15 other experimental con- 
ditions would be composed of the other possible combinations of our 
stimulus and organismic variables. An experiment carried out in 
this fashion would be called a 2 x 2 x 2 x 2 factorial design and 
would be classified as a series of Case 7 problems. 


uer 


. Socioeconomic status. 


THE NATURE OF PSYCHOLOGICAL RESEARCH 13 


7. DEPENDENT AND INDEPENDENT VARIABLES 


In each of the 7 cases described, we may think of the 
extreme right variable upon which the point of the arrow in the 
diagrams is impinging as the predicted variable. In experimental 
and research work this variable is often called the dependent 
variable. The variables to the left of the dependent variable are 
called the independent variables. The problem of research then 
is to determine whether or not changes or differences in the inde- 
pendent variables are associated with changes or differences in the 
dependent variable in any lawful fashion. Observations of the 
independent variables thus afford us a basis for prediction, pro- 
vided, of course, that some systematic differences or changes are 
observed in the dependent variable when the independent variables 
are systematically varied. 


8. EXAMPLES 


l. Examine a recent issue of a journal devoted to the publi- 
cation of psychological, educational, or biological research. (a) 
Classify the articles in the journal in terms of the cases described 
in the chapter. (b) Are there any articles which cannot be 
classified? (c) Can you think of a possible S —— O problem? 

2. Make a list of at least 5 stimulus variables, 5 response 
variables, and 5 organismic variables other than the ones mentioned 
in the chapter. (a) What available methods are there for quanti- 
fying each of the variables? (b) How might the variables for 
Which methods are not available be quantified? 

3. Assume that one of the variables in a research problem is 
If you were constructing an index or a test 
for this variable, what factors, in addition to income, would you 
want to take into consideration? 

4. A fortunate basketball coach at a small college once had 
5 players trying for the position of center on the college team. The 
members of the coaching staff were unable to differentiate between 
the abilities of the 5 players. What situation tests might have 
been developed to yield quantitative data concerning the ability 


-Of each player for the position of center? 


14 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


5. A graduate school awards a number of research fellowships 
in psychology to students who are believed to be outstanding. 
Assume that the awards are to be based primarily upon the poten- — 
tiality of the student to do research in the field of psychology. 
(a) What factors should be taken into consideration in making 
theawards? (b) What methods might be devised for quantifying 
this variable? 

6. An experimenter carries out an S —> R problem. Dis- 
cuss the conditions under which S can be used to predict R. . 

7. Distinguish between a dependent and an independent 
variable; between a qualitative and a quantitative variable. 


= 


CHAPTER 2 


Principles of Experimental Design 


1. SAMPLES IN RESEARCH 


The complete set of observations upon which an experiment 
is based is usually called a sample of N, where N refers to the 
number of observations. The particular group of observations 
corresponding to one phase of an investigation, such as the obser- 
vations resulting from the 6-pt. type, 60 m. sec. exposure, white 
on black background, with male subjects, set of experimental 
conditions of the problem discussed in the last chapter, would be 
called a sample of n observations. ` The sum of all sets of n obser- 
vations will be equal to N, the total number of observations. 

'T'he sample of observations is usually assumed to be repre- 
sentative of a much larger number of possible observations or 
measurements that might be made under the same experimental 
conditions. This larger group of potential observations is called 
the population. A population does not necessarily refer to in- 
dividual persons. A population might consist of measurements of 
height or of weight; it might consist of psychogalvanic responses 
obtained under specified conditions or of test scores; or we might 
refer to a population of error scores made in learning a maze. An 
industrial-research problem might be concerned with the popula- 
tion of spools of thread produced by a given machine. We might 
refer to the population of measurements of height obtained from a 
single individual by repeated measurements with the same mea- 
Suring instrument. So, likewise, the results of a single experiment 
may be thought of as the results of but one of a population of such 
experiments.! The population would here refer to the results 
which might be obtained if the same experiment were repeated 
under the same conditions an indefinitely large number of times. 

2 MS SEDIS: BOUE a 


! Fisher (1936), p. 3- 
15 


16 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


One method which is currently used to check the representa- 
tiveness of samples may be described. In the standardization of the 
Stanford-Binet test many children were tested. The occupational 
levels of the parents of the children in the sample were checked 
against the occupational levels of the population—the population 
data being obtained from the U.S. Census? In public-opinion 
polls the representativeness of the sample is checked with respect 
to educational level against the national figures available from the 
census. Similar checks might be made for such characteristics as 
occupational level, religious affiliation, and so on. 

The method of checking representativeness described de- 
pends, of course, upon a prior knowledge of the population, and this 
knowledge ordinarily refers to some characteristic or group of 
observations other than the one under investigation in the research; 
for it is obvious that if we knew the distribution of observations in 
the population under the conditions of the experiment, we should 
have no need of conducting the experiment itself. 

The assumption involved in checking representativeness in 
the manner described is that if the sample of observations is repre- 
sentative of the population in several characteristics for which the 
population distributions are known, it is also representative in 
other characteristics which are not known. The dangers involved 
here are obvious but are often overlooked. A sample may be 
representative of a population with respect to one characteristic 
but not necessarily with respect to others. College students at 
University A may be representative of all college students with 
respect to height, but not necessarily with respect to opinions on 
a social issue. If college students show a difference in ability to 
recognize words when they are printed in 6-pt. type and when 
printed in 12-pt. type and when both groups of words are exposed 
for 60 m. sec., this difference may or may not be present for sub- 
jects outside the college. The difference for noncollege students 
might be greater, smaller, or nonexistent. Or perhaps the differ- 
ence might even be in the opposite direction. Without actual 
testing of the noncollege students, we know only what we have 
found—that a difference exists for college students. Even here 
we may generalize to all college students only if those in the 


? Terman and Merrill (1937). 


PRINCIPLES OF EXPERIMENTAL DESIGN 17 


particular investigation are representative of all college students 
with respect to the ability under study. 

It may very well be true that with respect to most physio- 
logical measurements, à properly selected sample of male college 
students will differ very little from the population of all males 
within comparable age ranges in the United States. This does not 
necessarily mean that the sample of physiological responses of 
college students in an experimental situation will be comparable 
to the responses that would be obtained in the population of males. 
College students are a fairly sophisticated group. Their attitudes 
toward serving as “guinea pigs” in a research problem may differ 
considerably from the attitudes of the noncollege population. 
Differences in experience, sophistication, attitudes, and perception 
may influence the nature and degree of physiological responses 
evidenced in an experimental situation. 

When an experimenter or investigator generalizes his results 
from a sample to a population, it is his responsibility to indicate 
the logic and basis of the generalization. 


2. SAMPLING DISTRIBUTIONS 


Let us suppose that we have records of academic aptitude 
test scores of 900,000 college students. These test scores constitute 
a defined population, and from this population we select, in some 
fashion, a sample of 100 test scores. For the sample, we find the 
mean, symbolized by X, by summing the individual measures, 
symbolized by X, and dividing by the value of N. We may define 
this value in the following way: 


LX (1) 


where the Greek capital sigma, written as >, means to add or 
summate. Thus, we say that the mean equals summation X 
divided by N. Since the summation in this instance is perfectly 
clear as extending over all N values of X, the limits will not be 
written. 

Now for this sample of test sco: 
quantity, the estimate of the popu 


res we may also obtain another 
lation variance symbolized by 


18 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


s2. In terms of a formula, this quantity is given by 

2_ U(X — A) xy (2) 

z N -1 
The variance is obviously a measure of variability of the values of 
X about the sample mean. The square root of the variance is 
also a measure of variation of the values of X about the sample 
mean and is called the standard deviation. The standard deviation 
is symbolized by s and may be found by the following formula: 

| [Z= 

ium N= 1 (3) 

Let us return our 100 scores to the population from which 
they were drawn and select another sample of 100 cases in the same 
manner as the first. For this sample we also find the mean and 
standard deviation. We return this sample and select still another 
in the same manner, find the values of the mean and standard 
deviation, and continue the process until we have an indefinitely 
large number of values of the mean and standard deviation, each 
based upon a sample of 100 cases. 

We may say that our method of sampling is unbiased and 
that any single sample mean is an unbiased estimate of the mean of 
the population, based upon all 900,000 test scores, if the mean of 
the means of the successive samples tends to approach the popu- 
lation mean more and more closely as the number of samples 
becomes larger and larger. A similar statement could be made 
about the standard deviations of the samples if the average or 
mean value of the sample standard deviations also tended to ap- 
proach the population standard deviation as the number of samples 
became larger and larger. On the other hand, a biased method of 
sampling, or a biased sample, or a biased estimate of the popula- 
tion value would be one in which the mean of the means would 
tend to differ systematically from the population mean with respect, 
to the characteristic (test scores) under observation. 

A value calculated from a sample, such as the mean or the 
standard deviation, is called a statistic, and the corresponding 
measure calculated from the population is called a parameter, 

Sample values, statistics, may be expected to fluctuate from sample 


PRINCIPLES OF EXPERIMENTAL DESIGN 19 


to sample, whereas the population value is a single, constant 
measure at any given moment in time. The frequency distri- 
bution of statistics caleulated from an indefinitely large number of 
samples of a specified size is called a sampling distribution to dis- 
tinguish it from a frequency distribution of the observations in a 
single sample of a specified size. 

Let us take each value of the mean in a sampling distribution 
and subtract from it the mean of the sampling distribution. Then 
let us square the resulting deviations, sum, and divide by 1 less 
than the number of samples in the distribution. If we now take 
the square root of this value, we obviously have a quantity such 
as defined by formula (3). It would be called the standard 
deviation of the sampling distribution, but in order to distinguish 
this value from the standard deviation of a single sample of 
observations of a specified size, it is called a standard error rather 
than a standard deviation. The standard error is a measure of 
the variation to be expected of statistics from samples of size N 
under conditions to be specified later. 

The labor of empirically constructing sampling distributions 
for various statistics based upon samples of a specified size would 
be a tremendous job, and except in a few instances this has not 
been done. Fortunately, the form of these distributions for 
various statistics can be determined by mathematical means if 
certain conditions are met. In the few empirical checks which 
have been made, the correspondence between mathematical theory 
and empirical fact has been satisfactory. It can be shown, for 
example, that the standard error of the mean will be given by 


S 
—— 4 
dint: g 


where s is the standard deviation of a single sample obtained by 
formula (3) and N is the number of cases in the sample. 
'T'he essential condition of establishing these distributions is 
that of random selection. A random selection, from one point of 
View, may be considered to be similar to that of chance selection.? 
A sample of N observations is said to be a random sample when 


3 Kendall and Smith (1938) discuss this and other points of view. 


20 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


all possible samples of size N have an equal chance of being selected, 

the sample of N at hand being one of these possible samples, A 
random selection of 100 cases from the population of 900,000 2 
mentioned earlier could be obtained by means of a table of random 
numbers. Such tables will be described later, 

When the sampling distribution of a Statistic is known, it is 
possible to specify the relative frequeney with which we may 
expect given values of the statistic to occur as a result of random 
selection of a sample from the population. And knowing the 
relative frequency of occurrence of the various values, we can then 
make a probability statement regarding a particular occurrence, 

For example, if we knew that in a sampling distribution of means 

of samples of 100 cases drawn from the population of test scores 
mentioned, that means of 123.5 and larger could be expected to 
occur with a relative frequency of 2 in 100 times, we would then _> 
expect in random samples of 100 cases drawn from this population 

that on the average we would obtain a value as large as or larger 

than 123.5 for our sample mean about 2 times in 100. We might 

say that the chances of getting a value of 123.5 or larger are 2 in 

100. 


3. RANDOMIZATION AND EXPERIMENTAL DESIGN 


Let us take a simple experiment in learning. We are inter- 
ested in the efficacy of a shock administered each time the subject. 
makes an error as compared with the efficacy of a reward each timet 
a correct response is made. A group of 20 subjects is divided by 
the experimenter into 2 groups of 10 subjects each. One group is 
given one treatment, shock when an error is made, and the other 
group is given the second treatment, a reward each time a correct 
response is made. At the end of the experiment we find that the 
group given a shock learns more efficiently, or at least so it seems, 
for this group, on the average, requires fewer trials than the second 
group to learn the problem set before them. Or perhaps our 
observations are the number of errors made, and we find that the 
shock group makes, on the average, fewer errors in a given number 
of trials than the reward group. This result, regardless of the 
size of the difference between the means of the 2 groups, could 


PRINCIPLES OF EXPERIMENTAL DESIGN 21 


only be interpreted as indicating that the 2 groups do differ in this 
partieular learning situation. 

We would not know whether the difference observed was the 
result of the experimental treatment or was simply the result of 
initial differences in ability between the 2 groups. It may be 
that some bias of the experimenter (whether he was aware of the 
bias or not makes no difference with respect to the argument) was 
involved in the initial assignment of the subjects to the 9 
groups.* Subjects with higher levels of initial ability may have 
been assigned consistently to the shock group. Hence, if the 
treatments, shock and reward, had been reversed for the 2 
Broups, the group showing the fewer errors under the shock treat- 
ment might very well have shown fewer errors under the condition 
of reward also. The method of assignment of the subjects, which 
Permits a conscious or unconscious bias of the experimenter to 
Operate, prevents a clear-cut interpretation of the results of the 
experiment. 4 TERE " 

A well-designed experiment would immediately eliminate this 
Source of difficulty by assigning the subjects at random to the two 
experimental conditions. There are various ways in which this 
could be accomplished. We could number the subjects from 0 to 
19 and number also a pack of cards from 0 to 99. Then each 
individual may be represented by 5 cards. We shuffle the cards 
thoroughly and assume that the shuffling results in a random ar- 
rangement of the cards and consequently of the numbers appear- 

Mag on them. If this is true, if our assumption is valid, then the 
Probability of the cards appearing in any particular position is 
equal. We now take the top card and divide the number on this 
card by 20 and obtain a remainder. Suppose that the first card 
has the number 99. This divided by 20 Bives 8 remandar o! I; 
and the subject with number 19 would be assigned to condition l. 
The ‘next, card drawn, let us say, has number 28 on it, and this 
divided by 20 gives a remainder of 8. The individual with num- 
ber 8 is assigned to condition 2. The next card has the number 63 
9n it, and, since this gives a remainder of 3, the subject with this 


5 " ; i bias entering into selection may 
E teresting diseussion of human 
be fani. Sa ule abil Kendall (1947), pp. 337-339. See also Kendall and 


Smith (1939 Ds 


6$$9090506006 TERON KeA d 
8909040925 T E T A RA AR UM 


22 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


number is assigned to condition 1. If another card such as 39 is 
drawn next, this corresponds to individual 19, but since 19 has 
already been assigned to one of the experimental conditions, this 
number would be disregarded and another card drawn. After the 
Subjects have been divided at random into 2 groups, we could 
flip a coin to determine which group was to receive the shock 
treatment and which the reward. 


4. TABLES OF RANDOM NUMBERS 


Tables of random numbers have been constructed to facili- 
tate random selection. Table I in the Appendix is such a table, 
and the randomness of the figures has been tested by various 
methods.” The table may be entered at any point, any column, 
orany row. The point of entry should be selected at random also. 
This may be done by assigning numbers corresponding to the rows 
and columns to a pack of cards and then selecting 2 of these, letting 
the first correspond to the column to be entered and the second to 
the row. 

To illustrate the use of Table I (Appendix) let us take the 
case where we wish to divide at random a group of 20 subjects 
into 2 groups of 10 subjects each. We number our subjects from 
00 to 19. Let us assume that column 5 and row 26 give the point 
of entry into the table. It makes no difference in which direction 
we read from the point of entry. We may read up or down, right 
to left, or left to right. Suppose that we read downward. Since 
we need two-place numbers, we shall take the digits in columns 5 
and 6. We divide each number as we read it by 20 and obtain 
the remainder. The remainders of the digits 00 to 99 when 
divided by 20 will give an equal number of values corresponding to 
the subjects numbered from 00 to 19. We proceed as before in 
assigning subjects to the two experimental conditions, ignoring 
remainders that correspond to subjects who have already been 
assigned. When we reach the bottom of the page, we can start 


5 The methods used in testing randomness of numbers in such tables 
are discussed in Kendall and Smith (1938, 1939), Yule (1938), and Peatman 
and Schafer (1942). 

Much more extensive tables than the table included in this book may 
be found in Tippett (1927) and Fisher and Yates (1943). 


| 


7 


PRINCIPLES OF EXPERIMENTAL DESIGN 23 


with the next two columns, i.e., 7 and 8, and read upward. When 
we have reached the top of the page, we can take columns 9 and 
10 and read downward, and so on, until we have the subjects 
divided. 

This method of assignment could be considerably simplified 
by simply taking the first 10 unlike remainders and letting these 
Subjects be one group and the remaining 10 the second group. 

We could also use the table to assign the 2 groups at random 
to the 2 treatments. We decide ahead of time that a high number 
will correspond to the shock treatment and then draw a random 
number for each of our 2 groups. The group with the higher 
number will then be the group given the shock treatment. 

Suppose that we wanted to divide a group of 30 subjects at 
random into 3 groups of 10 subjects each. We could let 00, 01, 
and 02 correspond to the first individual; 03, 04, and 05 to the 
Second individual; and so on with 87, 88, and 89 corresponding to 
the last individual. Numbers above 89 will be ignored when they 
appear in the table. We then determine our point of entry and 
direction of reading. The first 10 subjects picked by means of 
the table will then be assigned to one group, the next 10 to another 
group, and the remaining 10 to the third group. We could then 
assign the 3 experimental treatments at random to the 3 groups by 
letting one treatment be represented by a high number, another by 
a low number, and the third by a number in between. We then 
draw 3 numbers from the table, one for each of our groups, and the 
group with the highest number will be given treatment 1, the 
group with the lowest number treatment 3, and the group with 


the number in between treatment 2. 


5. THE DIFFERENCE BETWEEN 2 MEANS 

For a practical illustration we have, in Table 1, 48 subjects 
in an elementary statistics class arranged alphabetically and with 
their error scores on a test of arithmetic ability. The test was 
given the first day the class met. Numbers were assigned to these 
Subjects as indicated in the table. We then used the table of 
random numbers to divide the subjects at random into 2 groups. 
Table I (Appendix) was entered at columns 05 and 06 and row 
05 of the third thousand numbers. Reading down columns 05 and 


21 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


06 and then up columns 07 and 08, the first 21 random numbers 
corresponding to 21 of the subjects gave the division of subjeets 
shown in the table. The sum of scores for Group A is 275 and the 
mean value is 11.46. For Group B the sum of scores is 265 and 
the mean value is 11.04. The difference between the means is 
equal to .42. 

Now if our method of assignment is random, we have every 
reason to believe that nothing but chance or random differences 
will be found between the means of the 2 groups. We should not 
expect the 2 means to be exactly equal, but we should expect a 
certain amount of variation to be present as a result of chance or 
random factors in the assignment. In the long run, however, we 
would expect the average difference between the 2 means to ap- 
proach zero. Sometimes the mean of the first 24 subjects selected 
will be somewhat larger than the mean of the remaining subjects, 
and sometimes the mean of the first 24 subjects will be somewhat 
smaller. These differences between the means of samples so 
selected and not subjected to different treatments we believe to 
be distributed around zero for an indefinitely large number of 
repetitions. 

If the measures or scores for Groups A and B represented 
scores obtained under different experimental conditions, a pertinent 
question to be raised is whether the difference between the means 
is sufficiently large to reject the hypothesis that the 2 conditions 
are equally effective, i.e., that the observed difference is a result 
of chance fluctuation to be expected when the true difference is 
zero: If the 2 conditions are equally effective, then in the long 
run we would expect the means of the 2 groups to be equal within 
the limits imposed by the random assignment of the subjects to 
the 2 groups. 

In order to evaluate the observed difference between the 2 
means, we need a measure of the variability to be expected when 
the hypothesis of no difference holds. If the observed difference 
exceeds the variability to be expected by a stated degree—and we 
shall put this into more exact terms later—then the hypothesis 
of no difference may be rejected. The hypothesis of no difference 
or, more precisely, the hypothesis that the mean of the sampling 
distribution of the difference between the 2 means is equal to 
zero is often termed a null hypothesis. 


PRINCIPLES OF EXPERIMENTAL DESIGN 25 


TABLE 1, Scores on an Arithmetic Test for 48 Subjects in an Elementary 
Statistics Class—Subjects Were Divided at Random 
into 2 Groups of 24 Subjects Each 


SON mm Numbers Order of ^ Scoresof Scores of 
Subjects sS Assigned Selection ^ Group A Group B 
KWA 13 00-01 13 
VRA 5 02-03 16 5 
ELB 6 04-05 6 
EJB 7 06-07 7 
JMB 20 08-09 20 
RMB 18 10-11 18 
PLB 8 12-13 2 8 
] SWC 23 14-15 23 23 
| JHC 20 16-17 20 
AWC 6 18-19 6 
TEF 18 20-21 7 18 
| ERF 9 22-23 : 6 9 
PCF 5 24-25 5 
fae 7% JEG 3 26-27 8 3 
í PEG 11 28-29 n 
H WG 0 30-31 0 
CPG 4 32-33 4 4 
SCG 18 34-95 21 18 ig 
DG 11 36- 
PLH 23 38-39 14 23 
RCH 7 40-41 12 7 
| CWH 4 42-43 18 4 
JDH 14 44-45 1i 
| HJH 7 46-47 7 
JBJ 12 48-49 5 12 
BAK 11 50-51 ll 
DNM 14 52-53 10 14 
GMM 10 54-55 19 10 
| NLM 2 56-57 2 
CAM 14 58-59 24 14 
r HHM 15 60-61 15 
JN 27 62-63 27 
HCO 7 64-65 13 7 
DBO 8 66-67 8 
GEP 2 68-69 3 2 
JMR 3 70-71 3 
RNR 10 72-73 10 
MMR 12 74-75 9 12 
MS 10 76-77 17 10 
JSS 12 78-79 11 12 
JES 22 80-81 22 
EES 19 82-83 1 19 
TWS 10 84-85 15 10 
JAS 12 86-87 12 
CDS - 8 88-89 8 
EAS 21 90-91 22 21 
AET 10 92-93 20 10 
VVW 9 94-95 
" Sum 275 265 


26 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


The measure of variability to be expected under the null 
hypothesis is provided by the replication of the experiment and the 
method of random assignment of the subjects to the 2 groups. 
If A and B did correspond to 2 experimental conditions, then we 
would require 2 subjects, one for each condition, for one trial of 
the experiment. Each additional 2 subjects, one of whom is as- 
signed to A and the other to B, constitutes an additional replication 
or trial of the experiment. With 24 subjects in each of the 2 
groups, we would have 24 replications of the experiment. 

It is commonly recognized that the standard error of a statis- 
tic such as the mean varies inversely with the square root of the 
number of replications,® but it is true that other factors, for ex- 
ample, more reliable measuring instruments, will accomplish this 
desirable end also. But there is one fundamental thing that 
replication provides which cannot be accomplished in any other 
manner. It provides a determination of the uncontrolled random 
varialion to be expected among individuals treated alike, i.e., subjected 
to the same experimental condition. A single replication of the 
experiment could not provide this measure ; we could assign the 
2 subjects at random to the groups, but the single subject in each 
group would not give a measure of variability, as this requires 
at least 2 subjects in each group. 

Randomization provides some assurance that the sources of 
uncontrolled variation will be properly randomized in the 2 groups, 
and replication provides us with a measure of this uncontrolled 
variation. "This simple experimental design, consisting of random 
assignment and replication, would provide us with a valid estimate 
of error (uncontrolled variation) against whieh to compare the 
variation observed between the means of the 2 experimental con- 
ditions A and B. In the chapters which follow, we shall demon- 
strate various modifications of the relatively simple experimental 
design involving but 2 groups. The point to be emphasized here 
is that without randomization and replication, the interpretation 
of the results of the experiment is not possible in any objective 
sense.? 


5 See formula (4). 
1 See formula (2). 
8 “It is possible, and indeed it is all too frequent, for an experiment 


qe MERE cti 


e 


PRINCIPLES OF EXPERIMENTAL DESIGN 27 


6. THE TEST OF HYPOTHESES 


All experiments involve the testing of some hypothesis, and 
this hypothesis is often referred to as the null hypothesis. In the 
fictitious experiment referred to above, the hypothesis was that 
the 2 experimental conditions A and B were equally effective. To 
state this more precisely, the hypothesis specified that if the ex- 
periment were repeated an indefinitely large number of times, the 
mean of the sampling distribution of the differences between the 
means would be equal to zero. This hypothesis is usually the one 
of primary interest in experimental work involving a difference 
between 2 means.? 

The evaluation of the observed mean difference (based upon 
the performance of subjects treated differently) by comparison 
with the measure of uncontrolled variation (based upon the per- 
formance of subjects treated alike) is called a test of significance. 
A test of significance is always a test of a statistical hypothesis 
and is a method for enabling the experimenter to determine 


whether or not he wishes to regard the hypothesis being tested as 


tenable or untenable. . noce 
The test of significance is based upon theoretical distributions 


of which one is the normal distribution. Other theoretical dis- 
tributions will be encountered in later discussions. In essence a 
test of significance yields a probability. By probability, sym- 
bolized by P, we shall mean theoretical relative frequency. If the 


o valid estimate of error is available. In such a ease 
the experiment, cannot be said, strictly, to be capable of proving anything. 
Perhaps it should not, in this case, be called e 5s "à bit be added 
merel; z of experience on which, for lack of anything be ter, we may 

e E ame Il that we need to emphasize immediately is 


have to base our opinions. A 1 z 
s allow us to calculate a valid estimate of error, its 


that, if an experiment does 3 det 3 s 
structure ok completely determine the statistical procedure by which this 


estimate is to be calculated. If this were not so, no interpretation of the data 
could ever be unambiguous; for we could never be sure that some other equally 
valid method of interpretation would not lead to a different result.” Fisher 


1942), p. 34. 5 
s à a= experiments other hypotheses about the difference between 
the 2 means may be of interest} for example, the Y eet that the mean of 
th i ; tribution of the difference between the means is some specified 
e sampling distribution thesis then to be evaluated is that the observed 


value, say 5.8 or 4.3. The hypo : 
sampio diferenoe does not exceed or fall below this value by an amount which 
cannot be reasonably attributed to random variation. 


to be so conducted that n 


28 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


probability yielded by the test of significance is small, then either 
the hypothesis and its related assumptions are false, or else an 
unusual (i.e., a rare or improbable) event has occurred. It is, of 
course, open to the experimenter as to the odds or smallness of 
the probability demanded before rejecting the hypothesis, ie., 
before assuming it to be false. But it should be clear that carrying 
out an experiment would be useless if the experimenter refused to 
reject the hypothesis tested, regardless of how improbable it is in 
terms of the results he obtains.!? 

Most experimenters take a probability of .05 as a standard 
level. This is a convenient reference standard, and a test of 
significance which yields a probability of .05 to .01 will be regarded 
as significant and the hypothesis being tested will be rejected, 
A probability of .01 or smaller will be regarded as very significant 
and will simply mean that the hypothesis being tested will be 
rejected with greater confidence than when the probability is be- 
tween .05 and .01. The meaning of these probability values may 
be illustrated by assuming that a difference between 2 means has 
been obtained and that we set up the hypothesis that the mean of 
the sampling distribution of the mean differences is equal to zero. 
Under this hypothesis, a comparison of the observed mean dif- 
ference with the measure of uncontrolled variation, by a test, of 
significance, yields a probability of .01. Now if the hypothesis 
being tested is true, then something has happened in this ex- 
periment that would be expected to happen but once in 100 times, 
The experimenter, if he follows the traditional practice, would 
regard this as improbable. Hence, he would reject the hypothesis 
that the mean of the sampling distribution of differences between 
the means is zero. If this hypothesis is rejected and he infers that 
some difference does exist between the means of his 2 groups 
that cannot be accounted for by random, uncontrolled variation, 
he may reason further that the difference observed is the result of 
his different experimental conditions. This reasoning, of course, 
is not part of the test of significance, but depends upon the struc- 
ture of the experiment and the controls exercised by the experi- 
menter. The test of significance is a test of a statistical hypothesis. 


ee 


1? Fisher (1942), p. 13. 


PRINCIPLES OF EXPERIMENTAL DESIGN 29 


7. TWO KINDS OF ERRORS 


In following the procedure described above we shall some- 
times be in error in the inferences drawn concerning the hypothesis 
tested. The frequency of errors is specified by the significance 
standard adopted. If we always reject a hypothesis when the test 
of significance yields a probability of .05 or smaller, and if we 
consistently follow this standard, then we shall incorrectly reject 
5 per cent of the hypotheses tested as false, when they are actually 
true. If we demand that the probability be .01 or less before re- 
jecting the hypothesis tested and consistently follow this procedure, 
we shall incorrectly reject as false 1 per cent of the hypotheses 
tested, when they are actually true. The danger in demanding a 
significance value of .01 or some smaller probability lies in the fact 
that by this standard we increase the danger that a false hypothesis 
will be accepted as true more frequently than would be the case if 
we demanded only a value of .05 before rejecting the hypothesis. 

These 2 kinds of error are commonly called Type I and Type 
II errors.!! If a hypothesis tested is actually true, but the test of 
significance results in our rejection of the hypothesis, then a Type 
lerrorhas been made. If the hypothesis being tested is actually 
false, but the test of significance results in our acceptance of the 
hypothesis, then an error of Type II has been made. The con- 
venient reference standard of rejecting hypotheses when the 
probability is .05 or smaller is a common one and will be followed 
consistently herein. In so doing, we shall have confidence that 
this standard, consistently applied, will in the long Tun result in 
only 5 per cent of the true hypotheses tested being rejected. j 

We must recognize also that the failure to reject a hypothesis 
does not therefore establish its truth, but only that the sample 
data offer no evidence against the hypothesis in terms of the 
standard agreed upon. A test of significance makes only more or 
less likely a decision based upon it. Regarding a hypothesis that 
the difference between 2 means is equal to zero as tenable does not 
mean that it is the only hypothesis which would be regarded as ten- 
able. We might also obtain a nonsignificant result (P greater than 


11 A more detailed discussion of these errors may be found in Churchman 
(1948). 


30 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


.05) if we tested the hypothesis that the mean of the sampling dis- 
tribution of the difference between the means was, let us say, —.09. 

It must also be made clear that no single experiment can 
establish the absolute proof of any result, however significant 
(regardless of the smallness of the probability) the results may 
happen to be. The 1 chance in 100 (P = .01), the 1 chance in 
1,000 (P — .001), or the 1 chance in 1,000,000 (P — .000001), for 
that matter, “will undoubtedly occur, with no less and no more 
than its appropriate frequency, however surprised we may be that 
it should occur to us. In order to assert that a natural phe- 
nomenon is experimentally demonstrable we need, not an isolated 
record, but a reliable method of procedure. In relation to the test 
of significance, we may say that a phenomenon is experimentally 
demonstrable when we know how to conduct an experiment which 
will rarely fail to give us a statistically significant result."!? 


8. PRACTICAL VERSUS STATISTICAL SIGNIFICANCE 


We may distinguish between practical significance and the 
kind of statistical significance just discussed. A test of significance 
might yield a significant result, probability less than .05, when the 
hypothesis that one mean equals a second mean was tested. Let 
us assume that these 2 means are obtained from an experiment 
upon the influence of milk upon weight. Two thousand subjects 
were divided at random into 2 groups; one group was given 1 
pt. of milk a day and the other group 1 qt. of milk a day over a 
given period of time. We shall assume, for the sake of the argu- 
ment, that all dietary and other conditions were under control, 
though this is an unlikely assumption. At the end of the ex- 
periment the mean weight for each of the 2 groups is found, and the 
hypothesis mentioned is tested. The test of significance yields a 
probability value equal to .05, but the observed difference between 
the means of the 2 groups is extremely small. 

As was pointed out earlier, additional replications serve to 
decrease the degree of uncontrolled variation. The 1,000 repli- 
cations in this particular experiment have resulted in an extremely 
small measure for the uncontrolled error, against which the dif- 

ooo ———————————————— 

12 Fisher (1942), pp. 13-14. 


PRINCIPLES OF EXPERIMENTAL DESIGN 31 


ference between the means is compared. Hence, although the 
test of significance indicates that the results are statistically 
significant, common sense would tell us that if the difference in 
mean weight between the 2 groups is so small that it requires 1,000 
cases in each group to detect the difference, it is not likely to be of 
any practical significance. 

In another research study we might find a difference of one 
tenth of an IQ point between the means of a group of boys and a 
group of girls. With an extremely large number of cases in each 
group, this difference might meet the requirements of statistieal 
significance. What practical significance this minute difference 
might have is another question. 

On the other hand, a difference between 2 means which is 
found to be statistically significant when the number of replications 
is quite small is obviously a difference which is easily detected and 
is much more likely to be of sufficient magnitude to be of some 
practical importance. For, in cases such as this, the measure of 
uncontrolled variation based upon a small number of replications 
is likely to be large, and hence the difference between the means 
observed will not turn out to be statistically significant unless it is 
fairly large also. 

We should keep in mind that the larger the measure of un- 
controlled variation—and this to a certain extent is a function of 
the number of replications—the larger must be the difference be- 
tween 2 means before the test of significance will yield a statis- 
tically significant result. On the other hand, the smaller the value 
of the measure of the uncontrolled variation, the smaller the 
observed difference between 2 means may be and still have the 
test of significance yield a statistically significant result. 


9. EXAMPLES 

1. The scores of 48 subjects on an arithmetic test are given 
in Table 1, page 25. (a) Using the table of random numbers 
(Table I, Appendix), select 2 groups of 15 subjects from the list 
of 48 subjects. (b) Find the mean error score for each group. 

2. Call the first 15 subjects selected in Example 1, Group A, 


and the second 15 subjects, Group B. Find the difference between 
the means of the 2 groups, subtracting B from A. Using the similar 


32 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


values obtained by the other members of the class, make a dis- 
tribution of the differences. (a) Taking all the differences obtained 
by the members of the class, what do you find with respect to the 
number of differences with positive signs and the number of dif- 
ferences with negative signs? (b) Find the average difference 
obtained by all members of the class. Is this value in accord with 
what you would expect? 

3. Divide the subjects in Group A into 3 groups of 5 subjects 
each. Do this by letting the first 5 subjects selected in Example 
1 be designated as A-1, the second 5 as A-2, and the third 5 as 
A-3. Do the same for the subjects in Group B. (a) Now find 
the means of the subgroups and the difference between the means, 
i.e., subtract B-1 from A-1, B-2 from A-2, and B-3 from A-3. 
(b) Using the similar values found by the other members of the 
class, make a distribution of the differences. (c) Do these dif- 
ferences show a wider range than the differences of Example 2? 

4. Select a row and column at random and enter the table 
of random numbers, Table I, Appendix. Record the first 50 digits 
(see the discussion in the chapter on the use of Table I). Count 
the number of odd and the number of even digits that you have 
recorded. How far off are your counts from 25 odd and 25 even? 
If other members of the class have performed this exercise, com- 
pare your results with theirs. 

5. In accordance with procedures discussed in the chapter, 
divide the members of your class at random into 2 equal groups. 
Keep a record of the members in each group. After the first 
examination, compare the mean scores of the 2 groups. 

6. Distinguish between (a) sample and population; (b) 
statistic and parameter; (c) standard deviation and standard 
error; (d) frequency distribution and sampling distribution; (e) 
biased and unbiased estimates; (f) Type I and Type II errors. 


CHAPTER 3 


> — — Probability and Experimental Design 


1. THE FARMER'S DIVINING ROD 


A farmer from near-by Whidbey Island visited the psycho- 

1 logical laboratory of the University of Washington. He had with 
him a carved whalebone and claimed that in his hands the bone 

was an extremely powerful instrument, capable of detecting the 

» 4 existence of even small quantities of water. To support his claim he 
said that several of his neighbors on Whidbey had tried unsuccess- 

| fully to bring in water wells. Finally they had called upon him 
" for help. He had taken his whalebone, grasped one fork in each 
; hand, and walked slowly over their ground. Suddenly the point 
or apex of the bone had dipped sharply toward the ground. When 

[ his neighbors had drilled wells at the points he had located in this 


fashion, they had found water. 


The farmer added that he was unable to explain his peculiar 
t power. His neighbors were unable to use the whalebone in locat- 
, ing water. It had to be in his hands before it would dip sharply, 


indieating the presence of water. 


He was somewhat disturbed by 


ii his ability, and he thought that perhaps the psychologists at the 
university would be interested in examining him and telling him 
why it was that he was able to use the bone so effectively while 


others could not. 


He himself thought that it had something to 


do with “magnetism” that emanated from his body. Anyway, he 
would be willing to demonstrate his ability so that the psycholo- 


gists could see for themselves. 


it to him. 
At this point in his story, the farmer took a paper cup and 
] filled i& with water and placed the cup on the floor. He then 
grasped the whalebone and held it stiffly in front of him as he 


moved slowly about the roo 
i 33 


Perhaps then they could explain 


m. When the apex of the bone passed 


34 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


over the cup of water, his arms trembled slightly and the bone 
dipped toward the ground. The farmer showed signs of strain 
and remarked that the force was so powerful, he was almost unable 
to keep the bone in his grip. 

The psychologist thanked the farmer for his demonstration 
and said that he would like to test the farmer's ability to locate 
water under controlled conditions, but that this would require 
some preparation. Would the farmer be agreeable to returning 
for these tests next week? The farmer agreed and promised to 
return at an appointed time. 

Now it is obvious that the "evidence" the farmer cited as to 
his ability is not the kind of evidence that would be satisfactory 
to a trained scientist. What evidence will be satisfactory? How 
shall the claim of the farmer be investigated? Let us see how the 
psychologist designed a simple experiment which would yield data 
bearing upon the problem. 


2. A SIMPLE EXPERIMENTAL DESIGN 


When the farmer returned to the psychological laboratory 
the next week, he was greeted by the psychologist and taken to 
one of the laboratory rooms. Spread around the floor of the room 
were 10 pieces of plywood about 1 by 1 ft. in size. Numbers from 
1 to 10 had been marked upon the top of each square. The pieces 
of plywood were resting upon tin cans, about No. 2 in size. The 
psychologist explained that he had used a table of random numbers 
and had picked 5 cans to be filled with water while the remaining 
5 were left empty. He emphasized that under 5 of the sections 
of plywood were cans with water and under 5 other sections were 
dry cans and that the arrangement of the empty and filled cans 
was purely a random one. The psychologist now wanted the 
farmer to take his whalebone and attempt to divide the 10 squares 
of plywood into two groups. One group would be the 5 covering 
the cans filled with water and the other group would be the 5 cover- 
ing the empty cans. The farmer did not need to make his choice 
in any particular order; he was merely to divide the set of 10 
sections into 2 groups of 5 each.! 


! This simple experiment is modeled after Fisher's (1942) classical case 
of the lady tea taster. 


PROBABILITY AND EXPERIMENTAL DESIGN 35 


Let us examine this experiment in some detail. We shall 
pay particular attention to the kinds of choices the farmer might 
make, the hypothesis which the experimenter is testing, and the 
manner in which the test of the hypothesis is to be made. 

'The psychologist may reason in this way: Let us assume 
that the farmer does not possess any particular powers which 
enable him to locate water with his whalebone; that the only 
factor which is operating in determining his choice is chance. 
This is the null hypothesis which the experiment is designed to test. 


3. PERMUTATIONS AND COMBINATIONS 

The possible outeomes of the experiment can be demon- 
strated in a simple way by the rules for permutations and com- 
binations.? Permutations refer to the number of arrangements 
(orders) in which a set of n distinct objects may be arranged. In 
general, the number of permutations of n distinct objects, taken r 


at a time, is given by 


n! - 
nPy = [em (5) 


where n! is called factorial n and represents (n)(n — 1)(n — 2) 
and so on, or the product of all the successive integers from n to 1. 


The factorial 0! is always taken equal to 1. 
In the problem at hand the number of orders in which 5 


Sections of plywood may be selected from the available 10 is 
10! _ (10)(9) 8) (7) (6) (5) (4) (8) (2 1) 
Ps = e 0000 
(10) (9) (8) (7) (6) = 30,240 


This figure gives us every possible set of 5 arranged in every 
possible order, i.e., any 1 of the 10 sections may be selected first; 
this choice may be followed by any 1 of the remaining 9; this 
choice may be followed by any 1 of the remaining 8; and so on 
until 5 have been selected. 


gebra text contains sections on permutations 


2 Al lementary al 1 
or ea discussion will be found in books on proba- 


and combinations. A thorough 
bility. See, for example, Uspensky (1987). 


36 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


But in this experiment the psychologist is not going to 
demand that the farmer select the set of 5 cans containing water 
in the particular order in which the psychologist put the water into 
them or in any other particular order. All that the psychologist 
is interested in is the set of 5, once the set has been selected. As 
far as he is concerned, the set 10, 5, 8, 2, and 3, selected in that 
order, is equivalent to the set 8, 3, 2, 10, and 5, selected in that 
order, or in any other possible order. 

It may be noted that the set of 5 selected objects or sections 
may themselves be arranged in (5)(4)(3)(2)(1) = 120 orders, 
according to formula (5). Thus, dividing 30,240 by 120, we 
obtain 252 ways in which a set of 5 objects may be selected from 
10, if the arrangement or order is ignored. In general, the number 
of combinations (arrangement ignored) of n distinct objects 


taken r at a time is given by 
n! 
P. n — r)! n! 
nC = == = m m  -D (6) 
JI. r! r!(n — r)! 


or in the present problem 


(10) (9) (8) (7) (6) (5) (4) (3) (2) G). (10) (9) (8) (7) (8) 


0s = TAA! ^ 6408020) 
30,240 - 
Cp ^4 


Now the best that the farmer could possibly do in the 
present experiment would be to select the particular set of 5 
which happened to be those with water in the cans, and this 
particular selection would be 1 out of 252 possibilities. If only 
chance factors were operating in determining the selection and 
this experiment was repeated an indefinitely larger number of 
times, then we would expect this particular set to be selected with 
a frequency approaching 1/252. Thus, 1 divided by 252 gives a 
value of .004 (more precisely, .00397), and this may be regarded 
as a probability. We say that the value of P is .004 or that this 


5 As pointed out earlier, we shall regard probability as a statement 
concerning theoretical relative frequency. 


£s 


px 


PROBABILITY AND EXPERIMENTAL DESIGN 37 


result would be expected by chance alone only about 4 times in 
1,000. This value of P is obviously smaller than that of .05, 
which we agreed to regard as significant. We also agreed that a 
significant value of P would result in a rejection of the hypothesis 
being tested. Hence, if the farmer is able to choose this particular 
set of 5 with the aid of his whalebone, then we should undoubtedly 
feel that the probability of this occurring by chance alone is 
sufficiently small that the hypothesis with its related assumptions 
is not considered tenable. 


4. EXPERIMENTAL CONTROLS 


At this point we shall do well to consider again what the re- 
jection of the hypothesis means. If the hypothesis is rejected, 
this means only that the experimenter is not willing to assume that 
chance determined the selection. It does not prove that the 
whalebone has had any particular influence upon the farmer’s 
choice. This is something that the test of the hypothesis has 
nothing to do with. The psychologist might be willing to assume 
or infer that the whalebone played some part in the farmer’s 
selection, but he would undoubtedly do this only if other possible 
explanations had been ruled out in terms of experimental controls. 
What are some of these alternative explanations? 

Without the experimenter knowing about it, the farmer may 
have used the toe of his foot to tap the cans under the board. 
Since in this manner the cans filled with water could easily be 
distinguished from the empty cans, it would account for a perfect 
selection upon the part of the farmer. If this was the basis of 
the farmer's selection, then obviously the whalebone had nothing 
to do with his choices. The farmer might even deny that he had 
used this cue, the sound of the can when tapped with his foot, if 
questioned about it. But the psychologist knows that many of 
our choices and judgments are based upon factors of which we are 
not aware. It would be the experimenter’s responsibility to have 
ruled out by observation, or by some other control, this possibility. 

Again, the psychologist would want to make sure that the 
farmer had not tapped the tip of the whalebone on the tops of the 
plywood sections. If the farmer had done this, his choice might 
be determined by the differences in sound of the sections covering 


38 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


the water-filled cans and the sections covering the empty cans. 
He could thus make a perfect selection of the 5 water-filled cans, 
and the experimenter would reject the hypothesis of chance. But 
note again that the rejection of the hypothesis of chance does not 
establish the validity of the farmer's claim concerning the influence 
of the whalebone. 

Another possible explanation of a perfect selection might be 
that the experimenter had spilled some of the water on the floor 
in filling the cans. This water might have been carefully mopped 
up, but slight cues may have remained. The absence of dust or 
the cleanliness of the floor under the Sections of plywood con- 
taining water, as a result of the mopping, might provide cues for 
the farmer’s choice. Or perhaps the experimenter gave some sign, 
a holding of his breath or an unconscious biting of his lips, as the 
farmer moved the whalebone over the sections containing water, 
The farmer’s choice might thus be based upon one of these un- 
conscious gestures or reactions of the experimenter, without, of 
course, the experimenter, and perhaps even the farmer, being 
conscious of the fact that these cues were the basis of the farmer’s 
choice. 

In a well-designed experiment these factors and many others 
that the experimentalist may suggest must be controlled if logical 
conclusions are to be drawn concerning the results of the experi- 
ment.* It is to be emphasized that these logical conclusions are 
derived from the structure of the experiment and the nature 
of the controls exercised. They do not come from the test of 
the statistical hypothesis. The statistical test indicates only the 
probability of a particular set of results upon the basis of the 
statistical hypothesis tested, namely, that chance alone is deter- 
mining the outcome. It does not prove that the farmer bases his 
choice on the whalebone or that the whalebone is in any way 
influential in determining the outcome. If the experimenter 
rejects the hypothesis of chance, he must still examine the structure 
of his experiment and the nature of his experimental controls in 
making whatever explanation he does make as to why he obtained 


‘See the discussion of experimental controls in extrasensory experj- 
ments by Kennedy (1939). 


ra 


PROBABILITY AND EXPERIMENTAL DESIGN 39 


the particular results he did. Needless to say, most psychologists, 
in terms of their knowledge of experiments upon related problems, 
would want to examine critically and carefully the experimental 
controls in the face of perfect results upon the part of the farmer.” 
The accumulated evidence upon the effectiveness of divining rods 
in locating water is negative.® 


5. A LIMITATION IN THE DESIGN 


Let us suppose that the farmer claims that the psychologist 
has set too high a standard; that occasionally the whalebone fails 
him and that the psychologist should not expect him to make a 
perfect selection of the 5 cans. What will the attitude of the 
psychologist be, for example, if the farmer selects 4 cans with 
water, but for his fifth choice makes an error and selects 1 of 


the 5 empty cans? . . 
The experimenter's attitude will again depend upon the 


number of ways in which this particular result may occur. Ignor- 
ing the order of selection of the cans, we may note that from the 
set of 5 cans with water, 4 cans may be selected in 5 ways. This 
is given by formula (6). Thus, 
5! — 6)4)B)2)0) _ 
4 746-307 HEA) 


Independently of this selection, 1 can may be selected from the set 
of 5 dry cans in 5 ways also, as determined by the formula for 


LL ———MM———— 


5p 

8 S aa par roh needs to be made with rupe Ri this june 
experiment. It is possible, though it may not seem to t "i puo ologist that 
it should happen upon this particular occasion, thain ues ect sel TaN of the 
5 cans with water may be 1 of the 4 in 1,000 which - ance indicates will 
occur with this particular frequency. If the experimenter repeated the ex- 
her series of trials, the probability of a 


40 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


combinations. Hence, there are (5) (9) ways or 25 ways in which 
this particular event can occur.? 

The probability then that the farmer will select 4 cans with 
water and 1 empty can may be assumed to be 25/252 giving a P 
of approximately .10. In terms of the significance standard we 
have agreed upon, the experimenter would thus regard this out- 
come as offering no basis for rejecting the hypothesis of chance 
selection. Not only, however, should the probability of the 
selection of 4 water-filled cans and 1 empty can be taken into con- 
sideration in evaluating this outcome, but also the probability of 
the selection of 5 water-filled cans; for any result of 4 or more 
“correct” selections bears upon the claim of the farmer. Thus, 
we would say that the probability of the farmer making 4 or more 
"correct" selections is equal to approximately .104. 

It is perfectly clear that within the scope of this particular 
experimental design only a perfect selection by the farmer would 
result in the rejection of the hypothesis of chance selection. The 
design permits no possibility of error and, in this respect, may be 
considered too demanding. 


6. INCREASING THE SENSITIVITY OF THE EXPERIMENT 


We shall examine briefly a variation of the experimental 
design that would permit the farmer to make an error and still 
permit the rejection of the hypothesis that his choice was based 
upon chance alone. 

In this variation 10 additional cans are obtained, and the 
20 cans are arranged in 10 pairs. One of each pair is filled with 
water, and which is to be filled is again determined by some 
random method. The farmer is told that he will be presented with 
10 pairs of cans, 1 of which is filled with water and 1 of which is 
empty, and that he is to select the member of each pair with 
water. What are the possible outcomes of this experiment? 

There are 2 ways in which the farmer’s first choice may be 


7 With knowledge, which the farmer has, that 5 cans are empty and 5 
are filled with water, an error in one direction, we may assume, will be com- 
pensated for by an error in the other direction. 

8 If an event can occur in a different ways and this can be followed by 
another event which can happen in b ways, and if the two events are inde- 
pendent, then the two events can occur together in ab ways. See, for example, 
any elementary algebra text. 


| 
| 
| 


a 


PROBABILITY AND EXPERIMENTAL DESIGN 41 


made, and, independently of this choice, the second selection may 
be made in 2 ways, and, independently of this choice, there are 2 
ways in which the third choice may be made, and so on for the 10 
choices. Thus, there are a total of (2) (2) (2) (2) (2) (2) (2) (2) (2) (2) 
= 1,024 ways in which the farmer may make his selections. Each 
choice may be judged "right" if the can containing water is 
selected and “wrong” if the empty can is selected, so that the 
possible results of the experiment may be recorded as 10, 9, 8, 7, 6, 
5, 4, 8, 2, 1, and 0 right. 

There is only 1 way in which 10 right may be selected, i.e., 
the water-filled can would have to be selected in every pair of the 
10 pairs presented, and this could occur in only 1 way. A selec- 
tion of 9 right from the 10 pairs presented may be made in 10 ways. 
The number of ways in which each of the other possible results 
may occur is given by formula (7) below. This is the formula for 
the number of permutations of n things (choices) of which n; are 
alike (right) and ne are alike (wrong). Thus, 

n! 


B en— — (7) 


In the present instance, formula (7) is equal to the number of 
combinations of n things taken r at a time. 'The number of ways 
in which a score of 10, 9, 8, 7, . . . , 0 right may be obtained is thus 
based upon 10 right and 0 wrong, 9 right and 1 wrong, 8 right and 
2 wrong, 7 right and 3 wrong, and n on. Therefore, we have 
10! 
18:0 = Toroi 


= 120 


S 
E 
] 


1l 


42 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


With repetitions of the experiment, therefore, we would 
expect a set of 10 right and 0 wrong choices to occur with a fre- 
quency approaching 1/1,024, a set of 9 right and 1 wrong choices 
with a frequency approaching 10/1,024, a set of 8 right and 2 
wrong choices with a frequency approaching 45/1,024, and so on, 
if nothing but chance is determining the selection. The value of 
P for 10 correct choices is thus 1/1,024 or approximately .001 
(more precisely, .00098) or about 1 time in 1,000. "This value of 
P is less than the value of .05, which we agreed to regard as sig- 
nificant, and hence if the farmer made 10 correct choices, the 
hypothesis of chance selection would be rejected. 

If the farmer made 9 correct choices, we would find that the 
probability of this would be given by 10/1,024 yielding a P of .01 
approximately (more precisely, .00977), and the hypothesis of 
chance selection would be rejected also. But here again in 
evaluating a choice of 9 correct and 1 wrong, we should take into 
account any result which would be more favorable to the farmer's 
claim also. Thus, the probability of 9 or more correct choices 
would be given by the sum of the ways in which 9 and 10 correct 
choices may be made divided by the total number of ways. We 
find, therefore, that the probability of 9 or more correct choices 
is 11/1,024 or .011 (more precisely, .01074). Similarly, the proba- 
bility of 8 or more correct choices by chance alone is given by the 
sum of the ways in which 8, 9, and 10 choices may be made, 
divided by the total number of ways, or 56/1,024 which yields a 
P of .055 (more precisely, .05469). Seven or more correct choices 
would give a probability of 176/1,024 or .172 (more precisely, 
.17187). Seven correct selections upon the part of the farmer 
would thus offer no evidence against the hypothesis tested, 
namely, that the choice is a matter of chance, in terms of the 
standard agreed upon. 

It may be noted that this particular design permits the 
farmer to make at least 1 error, something which the other experi- 
mental design did not, and stil permits the experimenter to 
reject the hypothesis of chance. In this sense, the present ex- 
periment may be said to be more sensitive than the one described 
earlier. 

Let us suppose that the farmer makes a sufficiently large 


| 


PROBABILITY AND EXPERIMENTAL DESIGN 43 


number of correct choices so that the hypothesis of chance selection 
is rejected. "The same care as to what this means must be given 
here as was given in the case of the experimental design previously 
discussed. The enumerated sources of difficulty would be a 
problem here also. 


7. THE BINOMIAL EXPANSION 


The evaluation of the results of the experiment just described 
may be considered from a slightly different point of view. From 
elementary algebra, we find that if P(a) is the probability that 
event a will occur, and if P(b) is the probability that event b will 
occur, then the probability that either a or b will occur, if the two 
events are mutually exclusive, is equal to the sum of the proba- 
bilities of a and b. This is the rule that we made use of in evalu- 
ating the probability of a score of 9 or greater in the experiment 
described. Since in a single trial with 10 pairs of choices, the 
farmer may obtain a score of either 9 or 10 correct, and these are 
mutually exclusive in the sense that if he obtains one, he cannot 
obtain the other, then the probability of a score of 9 or 10, in other 
words, 9 or higher, is 1/1,024 + 10/1,024 or 11/1,024. 

It can also be shown that if P (a) is the probability that event 
a will occur and P(b) the probability that event b will occur; then, 
if the 2 events are independent, the probability that both events 
will occur is equal to the product of the independent probabilities. 
Thus, in the experiment described, the probability of a correct 
choice in the first pair of cans presented was V4 and the probability 
of a correct choice in the second pair of cans presented was also 
1$9 The probability, then, that the first 2 choices would be 
correct was equal to (34) (34) or 04). 

Now if we let p equal the probability that a correct choice 
will occur with the first pair of cans presented and let q equal 
l — p to indicate the probability that the correct choice will not 
occur, then the probability of the farmer making r successive 
correct, choices followed by n — r incorrect choices will be given 


be expected for an indefi- 


D. retical relative frequency to p 
This is the theore assuming the reaction is 


nitely large number of reactions to a single pair, 
determined solely by chance. 


44. EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


by the stated rule for independent events, or by 
(pipepapsps: + © Pr) (419203 + * qu—) = pg" (8) 


From (8) above, the probability of the first 8 choices being correct 
and the last 2 choices being incorrect is 


(9 999 0 9]1(9 9] - 


I 
P SPP 
dole 
o 
P eme d 
dole 
t 


1,024 


But the probability of r correct choices and n — r incorrect 
choices in some other order than that given above is the same as 
p'q" with simply a different arrangement of p and q. And by 
previous rules for the permutations of n things of which n; are 
alike and the remaining nə are alike, we have found that this is 
simply the number of combinations of n things taken r at a time. 
Thus, the number of ways in which the farmer could make 8 
correct and 2 wrong choices would be equal to 


n! 10! 


"m-si soe ^ 


10C's = 


Now these 45 arrangements are mutually exclusive. We 
know also that the probability of any one of a given number of 
mutually exclusive events occurring is the sum of the probabilities 
of the separate events. We have just seen that the probability 
of a particular one of the 45 arrangements occurring is (14)§(14)°, 
and, hence, the probability that any one of them may occur will 
be (14)5 (14)? summed 45 times. Furthermore, since (14)* (14)? 
is a constant, this amounts to multiplying (14)* (V4)? by 45. Thus, 
for the probability of obtaining exactly r successes and n — r 
failures in n independent choices (trials) for which p is à constant 
probability of success from trial to trial, we have 

n! 


P(r)(n =r) = nep gT = ri(n =r)! pq (9) 


Or, in general, the relative frequency of n, n — 1, n — 2, n — 3, 


and so on, correct choices will be given by the binomial expansion 


SS te om m 


m 


PROBABILITY AND EXPERIMENTAL DESIGN 45 


am = I) poo 5 


4 "n aad. n—1 
(p ta = p* + np 9+ Ga pq 


A n(n — 1)(n — 2) 
(4)(2)(3) 


We could thus obtain by means of the binomial expansion 
above, a distribution of the relative frequencies or probabilities of 
various numbers of right and wrong choices in the experiment 
described. The probability, for example, of obtaining 10, 9, or 8 
correct choices, i.e., 8 or more correct choices, would be given by 
the sum of the probabilities for the first three terms of the ex- 


pansion l 
(3) 5) + er @) G im 


1 Iy% 1\2 
a tS Edi 10 

G+) -( + 
8. AN EXPERIMENT ON TASTE 


We may apply the binomial expansion to another problem. 
Suppose that we wish to determine whether orange juice can be 
distinguished from onion juice and apple juice, when visual and 
ave been experimentally controlled. We block 
the nasal passages of à subject and blindfold him. He is then 
presented with a set of 3 test tubes. He is told that one of the 
test tubes contains onion juice, one orange juice, and one apple 
juice, and that he is to pick out the one which he thinks contains 
the orange juice. Fifteen sets of 3 test, tubes are presented to the 
subject. How many correct choices will he have to make before 
we are willing to reject the hypothesis that his reactions are 
determined solely by chance? This will depend upon the possible 
outcomes of the experiment and the relative frequency with which 
a given number of correct choices may be expected to occur by 
chance. Ho 

rrect choice in any single 


The probability p of making a cor i | 
trial is 14 and q is equal to 1 — J4 or 74. Assuming that this 


probability remains constant from trial to trial, and with 15 trials, 


pg +-+++q" (10) 


olfactory cues hi 


10 Additional experimental controls, for example, a mouth rinse in be- 


tween tastes, would be involved also. : 


46 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


we havel! 


ey -Q ege seg e 


[4 12 3 15 
4 (15) (14) 03) 5) (3 dicas «(9 

(0))83) N37 N83 3 
The successive terms in the above expansion will give the proba- 
bilities of 15, 14, 13, . .. , 0 correct choices under the assumptions 
noted. , 

The labor involved in the direct expansion of the binomial 

is considerable, even when the number of trials is as few as 15. 
The desired probabilities may be obtained, however, by a technique 
of calculation involving successive multipliers which greatly 
reduces the work. We first obtain the value of p^, and this 
probability may be symbolized by P(n). Then the probability 
given by each succeeding term in the binomial may be obtained 
from the probability given by the term immediately preceding it, 
by means of the relationship’? 


Pin - i) = P(n -i+1) (=>) (3 


where i may take any value from 1 to n. 

Taking the taste-discrimination experiment, we find that 
-P (15) is equal to approximately .00000007. 'Then the probability 
of 14 correct choices will be given by 


P(is — 1) = P014) = PS -141) Gy) (2) 
3 


which simplifies to 


P(14) = P(15) (2) (2) = (.00000007) (30) = .00000210 


11 This expansion is considerably simplified by a table of the binomial 
coefficients such as may be found in “Mathematical Tables from Handbook of 
Chemistry and Physics (7th ed.), published by the Chemical Rubber Publish- 
ing Co. of Cleveland. Additional tables contained in this inexpensive book 
will greatly facilitate computations to be discussed later. 

12 This is a variation of the method described by Hoel (1947, p. 40) 


P. 


PROBABILITY AND EXPERIMENTAL 


D] 


P(15) = ( 


P(14) — (.00000007) 


EU ut 
wl HIS 
SS 5 
AA ~ 
S 8 
li i 


P(13) = (.00000210) 
P(12) = (.00002940) 
P(11) = (.00025480) 


P(10) = (.00152880) 


“—— 
els 


P(9) = (.00672672) 


eco” 
> 
t2 
© 
ll 


P(8) = (.02242240) 


a 
ao NIS 


qe. 
oo! 
pia 
KS 
~s 
ll 


P(7) = (05765759) 


DIN 
Ree. 


P(6) = (11531818) (5) © = 


6 
P(5) = (17937910) (55 


)@) = 
P(4) = (.21525492) =) (2) 
) 


P(3) = (.19568629) ($ (2)=. 
3 


P(2) = (13045783) (3) Q)-. 


DESIGN 


00000007 


:00000210 
.00002940 
.00025480 
.00152880 
.00672672 
.02242240 
.05765759 
.11531518 
.17937910 


.21525492 


19568629 


13045753 


06021117 


AT 


The calculation of the complete set of probabilities is shown below: 


48 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


P(1) = (.06021117) (3) (2) = .01720319 


P(0) = (.01720319) (3) (2) — .00229376 

From the probabilities obtained above, we see that a statis- 
tically significant result (P = -03) would be indicated by 9 or more 
correct choices. If the subject made 9 or more correct choices in 
the series of 15 trials, abiding by our standard of significance, the 
hypothesis of chance would be rejected. On the other hand, if the 
subject made only 8 correct selections, we should have to conclude 
that the outcome offered no evidence against the hypothesis of 
chance, since P = .09 is greater than the significance standard .05. 

The student may find it of interest to consider some of the 


experimental controls that would be necessary in the present ~ 


experiment. The order in which the set of 3 juices was presented 
for each trial would, of course, be randomized. Otherwise, if the 
orange juice was always presented first, second, or third, the 
subject’s choices might be made upon this basis, rather than upon 
taste—the factor in which we presume the experimenter is inter- 
ested. Similarly, the 3 juices should all be at the same temper- 
ature, for slight differences in temperature might introduce a 
systematic bias. For the same reason, the test tubes should all 
be of the same size and shape. What other experimental controls 
should be exercised? Should the Juices be put through an ex- 
tremely fine strainer, for example, or would this depend upon the 
interest of the experimenter? 


9. EXAMPLES 


1. A rat is placed upon a Lashley-type jumping stand and 
through a series of trials is trained to jump always to the smaller 
of 2 squares. The right and left position of the smaller square is 
randomly alternated so that the experimenter has some confidence 
that the rat is not reacting to a position variable. The experi- 
menter is interested in determining whether the established reaction 
pattern will be generalized to the extent that the rat will react 
similarly to the smaller of 2 circles, After the rat has learned the 


f 
I 
À 
| 


PROBABILITY AND EXPERIMENTAL DESIGN 49 


diserimination of the 2 squares, it is given a series of 8 trials with 
2 circles. The position of the smaller of the 2 circles is randomly 
alternated. We make the assumption that if generalization of 
the previous learning is not present, the rat will react to the 2 
circles on the basis of chance. On the other hand, if the rat jumps 
to the smaller circle with a frequency greater than we are willing 
to attribute to chance, this hypothesis will be ruled out, and we 
shall probably draw the inference that generalization has taken 
place. Following the practice suggested, the hypothesis of chance 
reaction will be considered untenable if the probability of the 
frequency of reactions to the smaller of the 2 circles is .05 or less. 
(a) How many jumps to the smaller of the 2 circles must the rat 
make in 8 trials if the hypothesis of chance reaction is to be 
rejected? (b) If the number of trials is increased to 12, how many 
Jumps to the smaller circle must be made before the hypothesis of 
chance reaction is considered untenable? 

2. It is claimed that infants stimulated by a loud sound show 
a response pattern that is differentiated from the pattern of 
response present when movements are restrained. Response to 
loud sound is said to be that of "fear" and response to restraint is 
Said to be that of "rage." An infant is stimulated 4 times by a 
loud sound and 4 times by restraint of movement, and motion 
pictures are taken of the responses immediately after stimulation. 
Photographs are made from the film and printed in strips. There 
are 8 strips, 4 showing reaction to sound and the other 4 to re- 
straint. This is explained to the subjects who are to serve as 
judges. 'The subjects are asked to pick out the set of 4 photo- 
graphs of rage. The questions which follow are related to the 
evaluation of the possible results of the experiment, under the 
hypothesis that the correct selection is a matter of chance. (a) 
What is the probability that a single subject will select the set of 
4 correct photographs from the 8? (b) What is the probability 
of selecting a set of 3 correct and 1 wrong? (c) What is the 
probability of selecting a set of 2 correct and 2 wrong? (d) 
Suppose that the experiment had made use of 12 photographs, 6 of 
rage and 6 of fear. Then what would be the largest number of 
wrong selections that a subject could have in the set of 6 selected 
in order for the hypothesis of chance still to be rejected? 


50 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


3. An experimenter has a problem concerning the appre- 
hension of words. The type size in which the words are printed 
is varied in 3 ways; the background is varied in 2 ways through 
the use of 2 different colors. If the exposure time is varied in 3 
ways, then how many different combinations of the experimental 
variables will be involved? 

4. An experimenter makes up a set of 4 cards, 3 of which are 
blank and 1 of which has an X printed on it. The cards are 
shuffled and placed face down in a row. The subject is to de- 
termine the position of the card with the X on it. (a) What is 
the probability that the subject will make a correct selection in a 
single trial, assuming that he is reacting by chance? (b) If the 
subject is given 4 trials, what is the probability that he will 
obtain precisely 3 correct by chance? (e) What is the probability 
that he will obtain precisely 2 wrong and 1 right in 3 trials? (d) 


If there are 128 subjects who serve in the experiment and each 


subject is given 3 trials, then how many subjects would we expect 
to get perfect scores by chance? (e) How many of the 128 
subjects would be expected to obtain scores of 2 or more correct 
by chance? 

5. An experimental situation involves selecting 1 card from 
a set of 4. Of the cards 3 contain exactly the same shade of gray 
and the fourth differs from the 3 others by an amount equal to 
the average threshold difference for a group of 100 subjects. Let 
us assume that a subject has no ability to detect the difference 
between the 1 card and the 3 others, i.e., that his selections will 
be upon the basis of chance. The subject is to be given 13 trials. 
By means of the successive multipliers noted in the chapter, find 
the minimum number of correct selections necessary in order for 
the experimenter to reject the hypothesis of chance. 


CHAPTER 4 


The Normal and x* Approximations of 
the Binomial Probabilities 


1. INTRODUCTION 


The formulas and methods of direct calculation of proba- 
bilities described in the previous chapter are useful in evaluating 
the outcomes of a variety of simple experiments. But, as evi- 
denced by the experiment on taste, the calculations, even with 
the short cuts suggested, become extremely laborious with an 
increasing number of trials. In this chapter, we shall examine 
Some approximation methods which might be used under specified 
conditions. 

Let us suppose that in the experiment where the farmer was 
presented with.10 pairs of cans, 1 of each pair being filled with 
water and 1 being empty, the essential conditions remain the same, 
but 40 pairs of cans are presented instead of 10. To evaluate the 
outcomes of this experiment the expansion of (14 + 14)*° would 
be necessary. The probabilities corresponding to the successive 
terms of this expansion could, of course, be obtained by the method 
described on page 46, but even so the calculations would be quite 
tremendous. Fortunately, the probabilities obtained by the 
binomial can be approximated quite satisfactorily in many prac- 
tical problems by means of the table of the normal curve. 

Since we have already obtained by the binomial the proba- 
bilities attaching to the outcomes of the experiment where the 
farmer was given 10 pairs of cans to judge, we shall take this same 
experiment to illustrate the approximation method. We shall 
then have a basis for evaluating the degree to which the proba- 
bilities obtained by the approximation method are comparable to 


1 Table III, Appendix. 
51 


52 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


those obtained by the binomial. In essence, it will be recalled, 
the experiment consisted of presenting to the farmer 10 pairs of 
cans. On the hypothesis of chance reaction at each presentation, 
the probability that the farmer will select the can with water was 
16, and the possible outcomes of the experiment could be evaluated 
by (p + q)”. 

It will be convenient to speak of each presentation of a pair 
of cans as an event, the outcome of which may be recorded as a 
success (correct choice) or a failure (incorrect choice). The set of 
10 choices which we assume, as before, are independent, we shall 
refer to as a sample of n choices. Let us suppose that this partic- 


180 Fic. 1. Theoretical frequency distri- 
bution of the expansion N(p + q», 
with N equal to 1,024, p equal 
to .5, q equal to .5, and n equal 
to 10. 


FREQUENCY 
P: 
o 


01232456 78 910 
NUMBER OF CORRECT CHOICES 


ular experiment is repeated N times. Then, if the results (out- 
comes) are in accordance with the binomial expansion, the observed 
frequencies of samples of 10, 9, 8, . . . , 0 successes will be given by 
the successive terms of the expansion N (p + q)”, where N equals 
the number of samples and n equals the number of events (choices) 
in each sample. 

A histogram has been drawn (Figure 1) to represent the 
theoretical frequency distribution when the experiment has been 
repeated 1,024 times, i.e., when N equals 1,024. The area under 
each column of the histogram may be considered as being made 
up of small rectangles, each rectangle having the same area. 
Thus, in the column corresponding to 10 correct choices, we have 
but 1 rectangle; in the column corresponding to 9 correct choices, 
we have 10 such rectangles; in the column corresponding to 8 
correct choices, we have 45 such rectangles; and soon. The total 
number of rectangles will be equal to N or 1,024, and we may say 


—— MÀ — 


ENR-  —— 


S 


: 


e» 


^ 


APPROXIMATIONS OF THE BINOMIAL PROBABILITIES 53 


that the total area under the histogram corresponds to N. Then 
the area under any given column of the histogram when divided 
by the total area will give the theoretical relative frequency with 
which samples with that particular number of correct choices may 
be expected to occur. We have agreed to regard probability as a 
theoretical relative frequency, and hence the probability of 8 or 
more correct choices would be given by the area of the columns 
corresponding to 8, 9, and 10 correct choices divided by the total 


area. 


2. THE NORMAL DISTRIBUTION 


Most students are familiar with the bell-shaped, symmetrical 
frequency distributions of test scores, weights, or heights which 
appear in textbooks. Such distributions are often, indiscrimi- 
nately, labeled “normal” distributions. The binomial distribution, 
shown in Figure 1, appears to be a normal distribution, and, indeed, 
it approximates a normal distribution very well. The theoretical 
binomial distribution, however, deals with the distribution of a 
discrete variable, whereas the normal distribution, which is also 
a theoretical distribution, deals with a continuous variable. In 
mathematical statistics, the theoretical distribution function of a 
continuous variable is represented by a curve rather than by a 
histogram. For the normal distribution, the equation of this 
curve may be written 

1 (FY 
eVe 


(11) 


b o 2x 

The symbol m is the mean, and the symbol c is the standard 
deviation of the distribution. These symbols are used instead of 
X and s because they are parameters, known exactly, and are not 
estimated from any given data. The symbols 7 and e may already 
be familiar: ~ is the ratio of the circumference of a circle to its 
diameter and is equal to 3.14159; e is the base of the system of 
natural logarithms and is equal to 2.71828. The equation for the 
normal curve need not be considered in detail here.” It is sufficient 


of the normal distribution may be found in 
tics book. See, for example, Hoel (1948) 


2 A more detailed discussion. 
any elementary mathematical statis 
9r Yule and Kendall (1947). 


54 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


for our purpose that a table has been prepared giving (1) the 
5 : à X -— 
ordinate of the curve corresponding to any given value of m 
Co 
(2) the proportion of the total area to the left of the ordinate; 
(3) the proportion of the total area to the right of the ordinate; 
(4) and the proportion of the total area falling between the ordinate 
and the mean. Table III in the Appendix is such a table. Since 
the curve is symmetrical, only the values for the right half are 


a. em 


given in Table III. The first column gives the value of 


, 


and the last column gives the value of the ordinate y erected at the 


point The other columns give the proportions of the 


total area, previously enumerated, in relation to this ordinate. 

Let us make some assumptions and see where they lead us. 
Let us suppose that the sampling distribution of a given statistic 
X is normal in form with mean m and standard deviation c. Then 
subtracting a constant, let us say the mean of the sampling dis- 
tribution, from every statistic in the distribution will not alter the 
relative frequencies of the various values, but merely change 
the manner in which the values are expressed. The values of the 
statistics will now be given in terms of deviations from the mean 
of the distribution. Nor will dividing each of these deviations by 
a constant, say the standard deviation of the distribution, change 
the relative frequencies. This method of expression for each of 
the statistics in the sampling distribution may be defined as z. 
In terms of a formula, we have 

g Em (2) 
g 
and hence, z will also be normally distributed if X is distributed 
normally. 

The value of this transformation is that z possesses certain 
properties that enable us to deal conveniently with any normal 
distribution, regardless of the particular mean and standard 
deviation of the distribution. The relation of z to the expression 


X-m 


of the equation for the normal curve may be noted. If for 


fs 


APPROXIMATIONS OF THE BINOMIAL PROBABILITIES 55 


this equation we set X equal to the mean and solve for y, we find 
this to be equal to .3989. If we set X so that it is one standard 
deviation above the mean and solve for y, we find this to be equal 
to .2420. By reference to Table III, we also find that between 
the mean and the ordinate erected at this point, the proportion 
of the total area under the curve will be equal to .3413. The 
proportion of the total area under the curve falling to the right of 
this ordinate will be .1587, and the proportion of the total area 
falling to the left of this ordinate will be .8413. Since the curve 
is symmetrical, we aleo know that the proportion of the total area 
falling between a value of X which is 1 standard deviation above 
the mean and a value of X which is 1 standard deviation below the 
mean will be equal to (2) (.3413) or .6826. 

How does this discussion relate to the problem we originally 
set out to investigate? Let us see if we can clarify the matter 
Somewhat. If we know that a variable X is normally distributed 
with mean and standard deviation also known, and if we select at 
random from this distribution values of X, the table of the normal 
curve will specify the relative frequency with which given values 
of X may be expected to occur. We have just seen, for example, 
that .6826 of the total area under a normal curve will fall between 
the ordinates corresponding to a value of X which is 1 standard 
deviation above the mean and a value of X which is 1 standard 
deviation below the mean. Since this is the relative frequency 
with which values of X between these two points may be expected 
g chance selection, it is also a statement of 


to occur, assumin 
t a value of X randomly 


probability. The probability, we say, tha 
selected will fall between the mean and plus and minus 1 standard 
deviation is .6826. a 

Can we determine a value of X which has a probability of .05 
of occurring? Indeed we can, for by reference to Table MI we find 
that an ordinate erected at a value of X which is approximately 
1.65 standard deviations above the mean will cut off a proportion 
of the total area in the right tail of the normal curve which is equal 
to .05. Using the same reasoning as we did before, we may say 
that the probability of obtaining by random selection a value of 
X which is 1.65 or more standard deviations above the mean of the 
distribution is .05. Similarly, the ordinate erected at a value of 
X which is 1.65 standard deviations below the mean will cut off a 


96 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


proportion of the total area in the left tail of the normal curve which 
is equal to .05. Thus, the probability of obtaining a value of X 
by random selection which is 1.65 or more standard deviations 
below the mean will also be equal to .05. 


3. RELATION OF THE BINOMIAL TO THE NORMAL DISTRIBUTION 


Now, if we can assume that the distribution of “correct” 
solutions by the farmer, in the experiment in which we are inter- 
ested, follows a normal distribution for an indefinitely large 
number of repetitions of the experiment, and if we know the mean 
and standard deviation of this distribution, then we should be 
able to set a particular value of X (frequency) for which the 
probability may also be determined by the table of the normal 
curve. This frequency, in terms of our discussion, should be one 
which would be 1.65 standard deviations above the mean.? If the 
farmer is able to select correctly, obtain a frequency of successes, 
which is equal to or exceeds this value, this outcome will have a 
probability of .05 or less, and hence the hypothesis that he is 
reacting by chance will be rejected. 

One difficulty which stands in our way is that the binomial 
distribution is discrete and the normal distribution is continuous. 
However, we may correct for the discontinuity of the binomial by 
treating the frequencies of correct selections in terms of an under- 
lying parallel continuum.* "Thus, a frequency of 5 correct selec- 
tions may be regarded as occupying an interval ranging from 4.5 
up to 5.5; a frequency of 8 correct choices may be regarded as 
occupying an interval ranging from 7.5 up to 8.5; and so on. 

Another diffieulty, and one which proves to be more serious 
in particular cases, is the degree to which the binomial distribution 


3 Note that our interest is in the claim of the farmer to be able to select 
“correctly,” in a series of trials, a frequency which will exceed that to be ex- 
pected by chance. This and only this outcome will enable us to infer that 
the farmer does possess some ability—provided, of course, that our experi- 
mental controls are adequate. If the farmer should select ‘‘correctly” a 
frequency that is below the frequency expected by chance, i.e., a frequency 
which is 1.65 standard deviations below the mean, this outcome would not 
enable us to infer that he possesses the ability he claims, although it is just as 
unusual or improbable an outcome as a frequency which is 1.65 standard devia- 
tions above the mean. This point is discussed in greater detail on pp. 77-79. 

4 See the earlier discussion on p. 4. 


em. m. Du RN eee 


APPROXIMATIONS OF THE BINOMIAL PROBABILITIES 57 


does approximate a normal distribution. If the number of events 
n in each sample were quite large, we might regard the distribution 
of frequencies given by N(p + q)” as being rescaled to the same 
baseline as Figure 1. The columns of the histogram would thus 
become narrower and narrower as n increased and the discrete 
appearance of the histogram would take on the appearance of a 
continuous curve. In the limiting case, as n becomes indefinitely 
large, the binomial approaches the normal distribution. 

But what of the case at hand, where n is equal only to 10? 
We have mentioned before, that even here, the binomial distri- 
bution of Figure 1 approximates à normal distribution fairly well. 
Let us find the mean and standard deviation of the binomial 
distribution of Figure 1. We may then use the table of the normal 
curve to obtain the probability of a given number of correct choices, 
under the assumption of normality. By comparing this proba- 
bility with that given exactly by the binomial, we shall have some 
indieation of the goodness of our approximation. 


4. PARAMETERS OF THE BINOMIAL DISTRIBUTION 


For the binomial distribution N (p + q)", it can be shown 
that the mean frequency of successful events (correct choices) will 
be given by 

m = np (13) 
Now the value obtained by formula (13) above is not a sample 
estimate, but is the mean of the theoretical sampling distribution 
and hence may be taken as the value of the parameter. It is given 
exactly, once n and p have been specified, and we shall use the 
symbol m to represent this parameter, rather than X which we 
reserve for a sample mean. p 

Similarly, it can be shown that the standard deviation 
(standard error) of this distribution is given exactly once n and p 
have been specified and does not involve any estimation. We 
shall use « rather than s to represent this parameter. Thus, 


o = Vnpg (14) 


nd formula (14) may be put in terms of the 


Formula (13) a 
Proportion of successes rather than the frequency. For the mean 


58 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


proportion we have 
m=p (15) 


and for the standard error of this proportion, we have 


EN (16) 


n 


5. APPROXIMATION OF THE BINOMIAL PROBABILITIES FROM 
THE TABLE OF THE NORMAL CURVE 


Let us now raise the question as to the probability of the 
farmer making 9 or more correct choices by chance. The discrete 
frequency of 9 is regarded as occupying an interval, the lower 
limit of which is 8.5. The probability of 9 or more correct; choices 
will thus correspond to the proportion of the total area under the 
normal curve falling to the right of an ordinate erected at the 
point 8.5 as in F'igure 2. 


Fra. 2— Theoretical normal distri- 
bution with mean equal to 5.0 
and standard deviation equal to 
1.581. Shaded area is that fall- 
ing beyond an ordinate at 2.214 
Standard deviations. 


Vm 
5.0 8.5 


To determine the proportion of the total area falling to the 
right of the ordinate at 8.5, the proportion corresponding to the 
shaded area of Figure 2, we must first express 8.5 as a relative 
deviate or value of z. This is done in the manner described by 
expressing 8.5 as a deviation from the mean of the distribution and 
dividing this deviation by the standard deviation of the distri- 
bution. Since n is 10 and p is .5, from formula (13) we find that 
the mean of the distribution is equal to 5.0. The mean 5.0 is 
regarded as a point, an exact value, and not as occupying an 
"uterval. The standard deviation of the distribution will be given 


APPROXIMATIONS OF THE BINOMIAL PROBABILITIES 59 


by formula (14) and is equal to V (10) (.5) 5) = 1.581 (rounded). 
Hence, 


We now enter the first column of Table III with the value of 
2.214 and seek the tabled value corresponding to this. We find, 
however, that the tabled values of z increase by hundredths and 
that 2.21 is the nearest tabled value to 2.214. The proportion of 
the total area falling to the right of the ordinate at 2.21 is .0136, 
and, hence, we may say that the probability of 9 or more correct 
Choices is approximately .0136 as determined by the table of the 
normal curve. 

By linear interpolation we may obtain a more exact answer. 
We have no tabled value of z for 2.214, but we do have entries for 
2.21 and 2.22, with corresponding proportions of the total area of 
the curve to the right of these 2 values given as -0136 and .0132, 
respectively. We may interpolate for the proportion to the right 


of the ordinate at 2.214 as follows:° 


2.214 — 2210 ( 0132 _ 0136) + .0136 = .01344 
2.220 — 2.210 


The probability of .013, obtained by means of the table of 
the normal curve, for 9 or more correct choices may be compared 
with the probability of .011 obtained directly from the binomial 
expansion. In this instance no conclusion concerning significance 
on should be familiar to students, 
method described above. Sup- 

The table of squares 
0 and 431 with square 
pectively. Thus, we 


5 The process of linear interpolati 
Another example may serve to illustrate the 
pose that we wish to find the square root of 430.821. 
and square roots (Table II, Appendix) has entries for 43 
roots of these numbers given as 20.7364 and 20.7605, res 
may make a linear interpolation as follows: 


430.821 — 430.000 (20.7605 — 20.7364) + 20.7364 = 20.7562 


431.000 — 430.000 , 
By direct computation, we find that the square root of 430.821, rounded to 


four deci is 20.7562. " y 

ad benem of interpolation, involving higher-order differences, may 
be used to obtain more precise values, but simple linear interpolation as de- 
Scribed here will give fairly accurate results. The more precise methods are 
described in Yule and Kendall (1947). 


60 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


would be changed by use of the normal curve. In a similar 
manner, we may determine that the probability of 8 or more 
correct choices as given by the table of the normal curve is .057 as 
compared with the probability obtained from the binomial of .055. 
Let us take still another example. From the binomial, we 
find that the probability of exactly 5 correct choices will be given 
by 252/1,024, so that P equals .2461. In the normal distribution 
this probability will be given by the proportion of the total area 
of the curve falling between ordinates erected at 4.5 aad 5.5. The 
area falling between these 2 ordinates may be readily deter- 
mined by expressing the limits as relative deviates. Thus, 
5.5 — 5.0 5 45—50  -5 


a = MM = = — 516 
1.581 1.581 5S end 1.581 1.581 816 


By linear interpolation we find that the area falling between the 
mean and a relative deviate of plus .316 is .124 and the area between 
the mean and a relative deviate of minus .316 is also .124. The 
sum of these 2 values, .248, gives the proportion falling between 
4.5 and 5.5 in a normal distribution with mean equal to 5.0 and 
standard deviation equal to 1.581. This probability of .248 may 
be compared with that previously obtained by the binomial which 
was .246. 

We thus see that in this particular problem the normal 
distribution provides a much simplified approach to the proba- 
bilities of the binomial and with a good approximation. In general, 
the normal curve tables will provide a satisfactory approximation 
to the binomial probabilities as long as np (or ng if q is less than p) 
is equal to or greater than 5.0. 'The approximation will not be 
greatly in error if the rule concerning np (or nq if q is less than p) 
is observed. The exact probabilities, of course, may always be 
obtained by the more laborious methods of the last chapter. 


6. EVALUATION OF THE EXPERIMENT ON TASTE 


One additional comparison may be made. In the last 
chapter we found the probabilities corresponding to the outcomes 
of the experiment on taste. This was the investigation which was 
concerned with the ability of the subject to distinguish orange 
juice from apple and onion juice when visual and olfactory cues 


e 


2 


APPROXIMATIONS OF THE BINOMIAL PROBABILITIES 61 


were experimentally controlled. In this experiment, it was 
assumed that the subject, if unable to distinguish between the 3 
juices, would respond by chance and that the probability of his 
selecting the orange juice would be }4. The binomial expansion 
showed that the probability of the subject making 9 or more 
correct choices would give a value of P equal to .031. This 
probability may be compared with that obtained from the normal 
curve approximation. 

With p equal to 4 and with n equal to 15, the mean frequency 
will be equal to 5.0. The standard deviation will be equal to 
V (15) (34) (24) = 1.826. The probability of 9 or more correct 
choices will be given by the proportion of the total area of the 
normal curve falling to the right of the lower limit 8.5 of the 
frequency of 9. Expressing 8.5 as a relative deviate we have 
8.5 — 5.0 

1.826 


From the table of the normal curve we find, by linear inter- 
polation, that the proportion of the total area falling to the right 
of an ordinate erected at 1.917 is .028. This does not differ too 
greatly from the probability of 031 obtained from the binomial 
and no conclusion concerning significance would be changed by the 


use of the table of the normal curve. 


z= = 1.917 


7. A PROBLEM IN OPINION POLLING 

f the table of the normal curve to 
problems involving the binomial distribution may be given. Let 
us suppose that a random sample of 100 cases from a defined 
Population is interviewed on & public opinion issue. Let us 


suppose also that the probability of an “agree” response is 14. 
This assumption amounts to saying that there is no difference 


between the frequency of those who will respond “agree” and 
those who will respond “disagree” to the issue, if the entire popu- 
lation could be interviewed. But since we have but a sample of 
100 cases from the population, some variation from the hypo- 
thetical population parameter may be expected as a result of 
chance or random selection. If N successive samples of 100 cases 
each were drawn from the population, the relative frequency of 


Further applications 0 


62 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


agree responses would be given by the expansion of the binomial, 
NGA + M5). 

On the basis of the assumptions noted, we may ask whether 
we have at hand an unusual sample if the frequency of agree 
responses is 60? How frequently, in other words, would samples 
with 60 or more agree responses occur in random samples of 100 
cases selected from the population, assuming the probability of 
an agree response to be 14? Corresponding to the experiment 
with the farmer, where the 10 choices were considered to constitute 
a sample of 10 events, so likewise the response of 100 people in the 
opinion sample may be thought of as a sample of 100 independent 
events, the probability of an agree response for each event being 14. 

The mean number of agree responses in the sampling distri- 
bution would be np or (100)(.5) = 50. The standard deviation 


of the distribution would be given by (100) (.5) (5) = 5.0. ~ 


Expressing the frequency of agree responses in the sample at 
hand as a relative deviate, and remembering the correction for 


continuity, we have 


From the table of the normal distribution curve we find that 
an ordinate erected at a point 1.9 standard deviations above the 
mean will cut off .0287 of the total area. Under the assumptions 
noted, therefore, we may say that the probability of obtaining a 
frequency of 60 or more agree responses in a random sample of 
100 cases selected from the defined population is approximately 
03. We might regard this particular sample, with a frequency 
of 60 agree responses, as one of the approximately 3 in 100 to be 
expected in random sampling from the population, or we might 
question the hypothesis that the probability of an agree response 
is 4. With assurance that the sampling was actually random, 
most investigators would question the validity of the hypothesis. 
If we abide by the significance standard of rejecting the hypothesis 
tested when the value of P is equal to or less than .05, the 
hypothesis would be rejected in the present instance, for the 
value of P is .03. 


| 


—— € — — 


APPROXIMATIONS OF THE BINOMIAL PROBABILITIES 63 


8. THE x? DISTRIBUTION 


The table of the x? distribution? may be used to evaluate the 
compatibility between a set of observed frequencies and the theo- 
retical frequencies expected on the basis of some hypothesis. Since 
the binomial distribution N (p + q)” yields a series of frequencies, 
it may readily be surmised that the table of x? may also be used 
to approximate the probabilities given by the binomial distribution. 
We shall illustrate the use of the x? table by reference to the 
experiment where the farmer was presented with the 10 cans of 
water. 

The total number of responses which the farmer makes is 10, 
and this is composed of 2 frequencies: the frequency of correct 
responses (suecesses) and the frequency of incorrect responses 
(failures). It would seem logical to take as the expected frequency 
of correct responses in this experiment, the mean number based 
upon the hypothesis of a chance response to each pair, i.e., np or 
5.0. Now, if this is the expected frequency of correct responses, 
then the expected frequency of incorrect responses is determined, 
in that the total number of responses must equal 10. Only one 
of these 2 sets of frequencies is free to vary, and we say that only 
1 degree of freedom is available. If one expected frequency is set 
by hypothesis (np in this instance), then the other is determined 
by the data of the problem. The concept of degrees of freedom is 
an extremely important one, and it will be discussed in greater 
detail later in this chapter and also in subsequent chapters. For 
the present problem, we may say that it refers. to certain restric- 
tions placed upon the assignment of the theoretical frequencies by 
the nature of the problem and the hypothesis being tested. 

One form in which the formula for the calculation of x? may 
be written is 

= jg; S cds 2) (17) 


[4 


Where o is an observed frequency and e is the corresponding 
expected frequency on the basis of the hypothesis being tested. 
je 


5 Table IV, Appendix. 


64 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


It is important to note that the discrepancy between the observed 
and expected frequency of correct responses and the discrepancy 
between the observed and expected frequency of incorrect re- 
sponses must be taken into account in the calculation of x”, as 
indicated by the summation sign. 

Let us raise the question again as to the probability of the 
farmer making 8 or more correct selections under the conditions 
previously described. If the farmer makes 8 correct choices, he 
will have to make 2 incorrect ones. These frequencies are dis- 
crete values, and, since the x? distribution is continuous, a cor- 
rection for continuity similar to that made in using the normal 
distribution will be in order. The correction is easily made by 
merely reducing each (o — e) by .5. Then 


(7.5 — 5.0)? | (2.5 — 5.0)? 
o 5.0 5.0 = 


x? 


2.5 


Our problem now is the evaluation of the obtained x? of 2.5 
by means of the table of the x? distribution (Table IV, Appendix), 
just as we evaluated an obtained value of z by means of the table 
of the normal distribution. But, whereas the table of the normal 
distribution is one-dimensional, the table of the x? distribution is 
two-dimensional. In entering the table of the normal curve, for 
example, we required only the value of z, but to enter the table of 
x? we need not only the value of x? but also the number of degrees 
of freedom on which x? is based. The mathematical distribution 
for x? changes with the number of degrees of freedom and hence a 
large number of separate tables corresponding to the table of the 
normal distribution would be necessary, if the distribution of x? 
were to be tabled as completely as that of z. For this reason, the 
table of x? is constructed somewhat differently from the table of 
z. The column headings, for example, give the proportion of the 
total area under the curve falling to the right of ordinates erected 
at the values of x? given in the body of the table for the degrees of 
freedom given at the extreme left. For example, for 1 degree of 
freedom (row 1), we find that an ordinate erected at x? equal to 
3.841 will cut off .05 of the total area. As in the case of the normal 
curve, we may interpret this as a probability. 

We have obtained a x? of 2.5. What is the probability of 


APPROXIMATIONS OF THE BINOMIAL PROBABILITIES 65 


obtaining a value as large as or larger than this under the hy- 
pothesis being tested? We enter the table of x? with 1 degree of 
freedom, but, unfortunately, because of the reasons stated above, 
we do not find a tabled entry for this value. This value of x? 
is somewhere between the two tabled entries 1.642 and 2.706, 
which have corresponding probabilities of .20 and .10, respectively. 
By linear interpolation into the table we may approximate the 
probability for the x? of 2.5. Thus, ' 


2.500 — 1.642 


TL (10 — .20 .20 = .119 
2.706 — 1.642 ( P 


and this is the approximate probability of obtaining a value of x 
as large as or larger than 2.5 for 1 degree of freedom. 

Now the probability of 8 or more correct responses as de- 
termined from the table of the normal curve was .057, whereas the 
probability we have obtained above is .119. What is the basis of 
this apparent discrepancy? It may be noted that if the farmer 
had made 8 incorrect responses and 2 correct responses, and that 
if we had solved for x? by means of formula (17), we would have 
also obtained a x? of 2.5. The evaluation of this same outcome 
by means of the table of the normal curve would have resulted in 
az value of —1.581. x? can only be positive in sign, whereas z 
may be positive or negative in sign. From the table of the normal 
curve, it may easily be determined that the probability of an 
absolute value of z, positive or negative in sign, is .057 + .057 = 
.114 and this result should suggest the answer for the apparent 
diserepancy between the two tests of significance. The proportion 
.119 of the total area under the curve falling beyond an ordinate 
at x? equal to 2.5 is equal to the proportion of the total area in 
the normal distribution marked off in both tails for the corresponding 
value of z, that is, beyond the ordinates at 1.581 and —1.581. If 
we wanted to know the probability of the farmer making 8 or more 
correct responses or 2 or less correct responses, then the desired 
probability would be given by the proportion of the area in the 
normal curve falling beyond an ordinate at 1.581 plus the propor- 
tion falling beyond an ordinate at —1.581.7 But for reasons dis- 


T The outcomes are mutually exclusive. 


66 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


cussed earlier in this chapter, we are interested only in the prob- 
ability of the farmer making a number of correct choices which 
will exceed chance expectancy. Hence, we desire only the proba- 
bility corresponding to the area in the right tail of the normal dis- 
tribution curve, i.e., the area falling beyond the ordinate at 1.581. 
To obtain this probability from the x? table, we take L$ of the 
tabled probability. The desired probability will thus be equal to 
.119/2 = .0595.8 


9. THE RELATION BETWEEN z AND xi FOR 1 DEGREE OF 
FREEDOM 


In the case of 1 degree of freedom, the distribution of x? is 
such that Vx? = zand 2? = x?. This relationship makes possible 
the easy transformation of x” to z or z to x*. From formula (12) 
we know that 


and in the case of the binomial, this becomes 
X — np 


j 


V npq 


If we now square the above equation, we obtain, by substitution 
in formula (17), 
(X - np}? o — ey 
A. 2= eae NE (18) 
npg e 


As an illustration of this relationship between x? with 1 
degree of freedom and z, we may evaluate the experiment on taste. 
"This was the experiment which involved the ability of the subject 
to distinguish orange juice from apple and onion juice when visual 
and olfactory cues were controlled. We found that by the direct 
application of the binomial, a P of .031 was obtained for 9 or more 
correct choices, and the use of the table of the normal curve gave 
a P of .028. Applying the correction for continuity, we now 


8 The slight discrepancy in the x? and z probabilities is the result of 
errors of rounding and interpolation. 


APPROXIMATIONS OF THE BINOMIAL PROBABILITIES 67 


obtain 
2; (X —np)? | (85 — 5.0)? 12250 
* 7^ apg 030004 338 
By linear interpolation, we find that the value of P for a x? 
equal to or greater than 3.675 will be given by 
3.675 — 2.706 
3.841 — 2.706 
and since we are interested only in 9 or more correct choices, the 
desired probability will be .0573/2 = .0286.° The interpolation 
would, of course, have been unnecessary if we had simply taken 
the square root of x? to obtain z equal to 1.92 (rounded) and had 
then entered the table of the normal curve with this value. 


= 3.675 


(.05 — .10) + .10 = .0573 


10. SUMMARY 

Both the table of the normal curve and the table of x? will 
provide satisfactory methods for approximating the probabilities 
obtained directly from the binomial, as long as certain precautions 
are observed. The correction for continuity should always be 
made. In using x’, we should take 
(le — e| — .5)? 


Corrected x? = È ^ 


and in using z, we should take 
|X — m| —-5 7 |X - np| - .5 
c V/npq 


Failure to make the correction for continuity will result in obtain- 
ing probabilities that underestimate (are smaller than) those of 
the binomial through practically the whole range of frequencies.!? 

In reading probabilities from the table of the normal curve 
and from the table of x2, care should be taken that the probability 
Corresponds to the hypothesis being tested. In all of the experi- 
ments described in this chapter, we were interested only in fre- 


Corrected z = 


" ? The slight difference in the values of P obtained by the two methods 
is again the result of errors of rounding and the interpolation. 
10 Goulden (1939), p. 103. 


68 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


quencies that were larger than the mean, for example, the proba- 
bility of obtaining 8 or more correct choices when the mean number 
expected by chance was 5. "The desired probability is obtained 
directly from the table of the normal eurve and corresponds to the 
proportion of the total area in the right tail cut off by an ordinate 
erected at the lower limit of the frequency being evaluated. 7 m 
x° is used to evaluate a similar hypothesis, the probability taken from. 
the table must be divided by 2. 

One additional precaution must be observed: np (or nqifq 
is smaller than p) should not be less than 5.0. In general, the 
larger this value is, the more satisfactory will be the x! and z 
approximations to the binomial probabilities. When this is not 
the case, the methods of the previous chapter should be applied. 
The approximation methods are extremely convenient to apply 
because of the simplicity of the calculations involved. If the 
precautions noted are observed, these methods will prove valuable 
in a variety of experimental problems. 


11. EXAMPLES 
Make the correction for continuity in all problems. 


1. In a testing center it has been determined, let us assume, 
that the average test scorer is in error on 6 per cent of the papers 
scored. A new employee scores 500 papers during a given day, 
and it is found that 40 of his papers are scored incorrectly. Let us 
assume that his errors should not exceed those which may be 
attributed to chance for the average scorer. We may think of 
the 500 papers which are scored as consisting of 500 trials of an 
event for which the probability of making an error in a single trial 
is.06. Using the table of the normal curve, what is the proba- 
bility of the scorer scoring 40 or more papers incorrectly by chance? 

2. If there were 100 average test scorers working in the center, 
then within what limits might we expect the errors of 95 per cent; 
of them to fall on a given day, assuming that each one scored 500 
papers. Note that in this problem we are interested in errors 
which are below the mean as well as errors which are above the 
mean. If the table of the normal curve is used to find the answer, 
we need to locate the frequency corresponding to the ordinate at 
the right of the mean which will cut off .025 of the total area in the 


oF 


APPROXIMATIONS OF THE BINOMIAL PROBABILITIES 69 


right tail. Since the normal distribution is symmetrical, the same 
ordinate to the left of the mean will cut off .025 of the total area 
in the left tail. Thus, we may expect 95 per cent of the test 
Scorers to fall within these limits. 

3. A subject is trained to push a key when the first of 2 
tones which he hears is of greater intensity. The difference 
threshold for the subject is determined, and in a new series of 
trials 2 tones are sounded which differ in intensity, but for 
which the difference is below the threshold for the subject. The 
louder tone is randomly alternated with the weaker tone so that 
it sometimes appears first and sometimes second. The subject 
claims that he is unable to distinguish between the 2 tones— 
that he would have to guess. The experimenter tells him to go 
ahead and guess, and the subject does so for a series of 30 trials. 
If his judgment is simply a guess, then on each trial we may expect 
the probability of a correct guess to be.5. Use the table of the 
normal curve to determine how many correct judgments he would 


have to make before we would reject the hypothesis that he is 


responding by chance. 

4. A rat is trained to respond to the larger of 2squares. The 
rat is now given a series of 40 trials with 2 circles differing in size. 
If we assume that the rat will respond to the 2 circles by chance, 
i.e., that no preference will be shown for the larger of the 2 circles, 
then how many responses to the larger of the 2 circles must be 
made in the series of 40 trials before this hypothesis will be re- 
jected? We need to find the frequency corresponding to the 
ordinate which will cut off .05 of the area in the right tail of the 
normal curve. : 

5. A child is presented with 3 boxes, of which 2 are of the 
Same color and 1 is of a different color. Candy is placed under 
the box that is of the odd color and without the child's knowledge. 
He is then allowed to lift the boxes until he discovers the one which 
has the candy under it. After a series of trials, he immediately 
goes to the box with the candy. The situation is now changed by 
using boxes of the same color, but with 2 of the same size and 1 
that is of a different size. The child is given a series of 18 trials 
with the new boxes. Let us assume that there is no transfer of 
training from his experience with the colored boxes to the boxes 


70 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


differing in size. The candy will always be placed under the box 
which differs in size from the other 2 and we shall assume that the 
probability of selecting this box is 144. How many times, in the 
series of 18 trials, must the child select the box with the candy 
under it before the hypothesis of chance will be rejected? This 
problem is similar to that described in the previous example. 

6. It is important to note the nature of the experimental 
design in the last two examples. Let us assume that the rat in 
Example 4 and the child in Example 5 both make a sufficiently 
large number of reactions in the direction indicated so that the 
hypothesis of chance reaction is rejected. This would mean that 
the probability of z is equal to or less than .05. Now, having 
rejected the hypothesis of chance reaction, you may feel inclined 
to draw the conclusion that this is evidence for transfer of training 
from the earlier trials to the present situations. That is to say, 
that the rat, for example, having learned to react to the larger of 
the 2 squares, generalizes this response to the larger of the 2 circles. 
This may or may not be the case. There may be no generalization 
whatsoever, the rat simply learning in a series of 40 trials to re- 
spond to the larger of the 2 circles. Likewise, the child may not 
generalize from the colored boxes to the boxes differing in size. 
The child’s responses may simply be evidence of learning in the 
series of 18 trials. We would want to compare the performance 
of the individuals in the new test situations with the performance 
of other individuals who had not been given the previous training 
experiences. If the rat or the child made significantly more 
correct choices in the new situation than individuals without the 
previous training, this might be taken as evidence of transfer or 
generalization of the previous learning. Methods of making such 
tests will be described in a later chapter. 

7. In a city of 500,000 homes, a census shows that 80 out of 
every 100 homes have a radio. An advertising agency takes a 
sample of homes in the city and includes an incidental question 
on radio ownership. The sample consists of 500 homes, and it is 
found that 380 of the dwellings in the sample have radios. If the 
sampling is random, how frequently would samples of 500 cases 
with as few as or fewer than 380 radios be expected to occur? Note 
that this is a test involving the left tail of the normal curve. 


~ 


—— au coi ge oum o^ L eee END T. 


APPROXIMATIONS OF THE BINOMIAL PROBABILITIES 71 


8. Suppose, taking the data of the previous example, that 
we ask the question: How frequently, in random sampling, would 
we expect to obtain samples with as few as 380 or fewer homes with 
radios or with as many as 420 or more homes with radios? This 
probability may be obtained directly from the x? table and should 
be equal to twice the probability obtained in Example 7. Note 
the difference in the question asked here and in the question asked 
in the previous example. Two different hypotheses are being 
tested. 

9. Ina large midwestern college it is known that 62 per cent 
of the students are registered in the college of liberal arts. The 
campus daily draws a sample of 200 students for a public opinion 
poll and finds that in the sample there are 136 liberal arts students. 
If the sampling is random, how frequently would samples with 
136 or more liberal arts students be expected by chance when the 


sample size is 200? The test should be made on the right tail of 


the normal curve. 
10. A child is seated at a table across from the experimenter. 


The child is shown a desired object, and this is placed at the end 
of the table toward the child’s right. Directly in front of the 
child across the table is a spot marked by an X. The child is 
blindfolded and asked to move a disk toward the spot. Let us 
assume that the probability of moving the disk to the right of 
the spot is equal to the probability of an error to the left. Ina 
series of 20 trials, the child makes 14 right errors and 6 left errors. 
Does the child show a significant bias ioward the position of the 
desired object? B 

11. In the previous example, itis again important to examine 
the design of the experiment. Let us assume that the child makes 
significantly more right errors than left errors. We would not wish 
to conclude, from this alone, that the errors are the result of the 
placing of the desired object on the child's right. For all we know, 
the child might show a right-error bias if the desired object had 
not been placed at the right. What would you want to do to 
check upon this possibility? l i 

12. In an airplane factory it is found that over a period of 
time 202 accidents occurred. Tt is also known that 82 of these 
occurred during the first 4 hr. of work and that 120 occurred during 


72 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


the second 4 hr. Under the hypothesis that accidents should be 
distributed evenly over the 2 work periods, there should be no 
difference in the 2 frequencies other than that to be expected as 
a result of random fluctuation. We are interested in possible 
deviations in both directions, and the test of the hypothesis may 
be made by means of x?. 

13. It is found that a sample divides 50 : 150. "Test the 
hypothesis, by means of the x? test, that it is a random sample 
from a population dividing 1:4. Note that, in terms of the 
hypothesis stated, deviations in either direction will be of interest. 

14. A value of x? equal to 8.41 is obtained and has but a 
single degree of freedom. This value exceeds the 1 per cent value 
in the table of x?. We thus know that P is less than .01. In 
terms of the discussion of the chapter a more precise statement 
may be made concerning the value of P. Find the probability of 
obtaining a value of x? as large as or larger than 8.41 for 1 degree 
of freedom. 

15. Previous research indicates that the probability that a 
freshman will pass a given item in an aptitude test is .5. In a new 
incoming freshman class of 800 students, 440 pass the item. Can 
we conclude that the number passing the item is significantly 
greater than expectancy? Test the hypothesis by means of ie 

16. By linear interpolation into the proper table, find the 
probabilities indicated: 

(a) A x? of 3.68 or larger. 

(b) A plus z of 2.146 or larger. 

(c) An absolute value of z of 1.384 or larger. 


17. By linear interpolation, find the squares and square roots 
as indicated: 

(a) The square root of 430.8201. 

(b) The square of 595.58. 

(c) The square root of 656.8672. 

(d) The square of 400.275. 


CHAPTER 5 


Experiments Involving a Comparison of 
the Difference between 2 Frequencies or 
Proportions 


S 


1. INTRODUCTION 


An experimental design which is frequently encountered in 
research involves a comparison between 2 frequencies or 2 propor- 
tions. For example, a group of N subjects may be divided at 
random into 2 groups of nı and ng subjects. One group might be 
given intermittent reinforcement in a conditioning experiment and 
the other group reinforcement on each trial. After comparable 
training periods, a critical trial is given each subject to determine 
whether the conditioned response under investigation occurs or 
not. The experimenter is interested in knowing if the frequencies 
or proportions of response in the 2 groups are significantly different. 

In a learning experiment à group of N rats may be divided 
at random into 2 groups of nı and n rats. One group is given à 
series of training trials in a maze under one set of experimental 
conditions. The second group is also given a training period, but 
under a different set of experimental conditions. In terms of a 
particular theory of learning, the rats in the first group should, 
when placed in a new maze with only 2 paths to the goal box, 
select one of the paths more frequently than the rats in the second 


group. The difference in response is in accordance with the learn- 


ing theory, but the experimenter wishes to know whether the differ- 


ence is statistically significant. . f 
To illustrate the methods involved in evaluating the outcomes 


of experiments such as those described above, let us assume that 

we have 2 groups of subjects with m1 equal to 50 and nə equal to 30. 

Group 1 has been subjected to one set of experimental conditions 
13 


74 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


and Group 2 to another set of experimental conditions. The 
appearance of a particular choice, response, or behavior pattern 
during a critical test trial, which is in accordance with a theory of 
the experimenter may be called a success. The nonappearance 
of this choice, response, or behavior pattern may be called a failure, 
The frequency of successes and failures for the 2 groups in the 
critical test trial are given in Table 2. 


TABLE 2. Frequency of Failures and Successes for 2 Random 
Samples on a Critical Test Trial 


Failure Success Total 
Group 1 8 42 50 
Group 2 12 18 30 
Total 20 60 80 


If Group 1 is expected, in terms of the theory, to show a 
higher proportion of successes, it is obvious that the outcome of the 
experiment is in the direction predicted by the theory, for the 
proportion of successes in Group 1 is 42/50 — .84, whereas 
the proportion of successes in Group 2 is only 18/30 = .60. The 
problem of the experimenter is to determine whether these 2 
proportions differ significantly. 

It is not sufficient merely to point to the observed difference 
between the 2 proportions, for such differences may be expected to 
occur by chance or random sampling, even if the 2 groups had not 
been subjected to different experimental conditions. To be sure, 
a difference as large as that observed may be expected to occur 
infrequently when this is the case, but the problem remains to dis- 
cover precisely how infrequently. If the observed difference is as 
large as or larger than one that would be expected by chance but 
1 time in 100, then the experimenter may have a given degree of 
confidence in rejecting the chance hypothesis. On the other hand, 
if the observed difference is one that may be expected to occur 
by chance, let us say 40 times in 100, then the experimenter would 
have but little confidence in rejecting the chance hypothesis. 


E MÀ — D I à 


E o se pM 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS 75 


2. THE NULL HYPOTHESIS 

The outcomes of this particular experiment may be regarded 
as but one of a large number of possible outcomes if the experiment 
were repeated an indefinitely large number of times. If this were 
actually done and if the difference between the proportion of 
successes in the 2 groups were found for each of these repetitions 
of the experiment, the differences would form a sampling dis- 
tribution. We do not know, of course, what the mean of this 
sampling distribution would be, but we may specify it in terms of 
à hypothesis. 

Let us argue that there is actually no difference between the 
effectiveness of the 2 experimental conditions, that the proportion 
of successes in the 2 groups should be equal. This is a familiar 
null hypothesis. If the experiment is repeated an indefinitely large 
number of times and we consistently subtract p» from pı, then, 
under the null hypothesis, some of these differences will be positive 
in sign and some will be negative. These differences will cluster 
around the mean of the sampling distribution which we have 
Specified by hypothesis to be zero. The difference which we have 
obtained in our particular experiment is .84 — .60 or .24, and this 
value falls to the right of the mean of the sampling distribution. 

Let us assume that this theoretical sampling distribution of 
differences also approximates a normal distribution. Then, on 
the basis of the methods described in the last chapter, it should be 
clear that if we knew the standard deviation (standard error) of 
this distribution, it would be possible to specify the relative fre- 
quency with which differences as large as or larger than .24 could 
be expected to occur in random sampling, when the mean of the 
sampling distribution is zero. For, by expressing .24 as a relative 
deviate, we could determine from the table of the normal curve the 
Proportion of the total area falling to the right of an ordinate 
erected at this point. If the observed difference is sufficiently 
large that the probability value, as obtained from the sample data, 
is .05 or less, then the hypothesis that the mean of the sampling 
distribution is zero would be rejected. The observed difference 
between the 2 proportions would be regarded as statistically 
Significant, in other words, and the hypothesis that pı = ps = p 


76 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


would be rejected. Hence, the experimenter would infer that; since 
chance fluctuations do not seem to account for his observed dif- 
ference, the experimental conditions must have contributed 
significanily to his results. 


3. STANDARD ERROR OF THE DIFFERENCE BETWEEN 2 UNCOR- 
RELATED PROPORTIONS 


Now it can readily be shown that if a random sample is 
drawn from a population in which the population of successes is D, 
the standard error of this proportion will be given by formula (16). 
If we take one sample of n; cases from the population and a second 
sample of n» cases, and if the 2 samples are independently drawn, 
it can also be shown that the standard deviation of the sampling 
distribution of differences between the proportions will be 


Opp: — V op, jr op,” = (19) 


The values of p; and ps are, of course, unknown. They are 
not specified by hypothesis, for the experimenter, we assume, has 
no advance knowledge of the proportion of successes to be expected 
in the 2 groups of subjects under the experimental conditions. If 
he had, he would have no need of experimentation. The experi- 
menter does have, however, the 2 sample values based upon the 
data of the experiment, but these values themselves are subject to 
sampling variability. It might occur to the experimenter to sub- 
stitute his sample values for the population parameters in formula, 
(19), but this standard error would not relate directly to the 
hypothesis that p; = ps = p. A much sounder approach is 
available. 

In the experiment at hand, we have already specified by 
hypothesis that p; = p» = p. This is the same as stating that the 
2 samples are from a common population and that we have avail- 
able 2 estimates of the parameter p of the common population. 
These 2 proportions are .84 and .60. Under the hypothesis that 
Qi = P2 = p, we may replace p; and ps as 2 separate estimates of 


& 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS 77 


the common unknown population value p by an estimate based 
upon the pooled frequencies of the 2samples. This is obtained by 


mpi + nopo 
Sa 2 
Ny + No (20) 


The estimate of g based upon the pooled frequencies will be given 
by a similar formula, but it can be obtained quite readily from the 
fact that q will be equal to 1 — p. 

Under the null hypothesis, the values of p; and po and qı 
and q in formula (19) may be replaced by the common estimates 
of these parameters based upon the pooled frequencies. 

In order to simplify the notation, let us set up the schematic 
representation of the frequencies of Table 2 as shown in Table 3. 


TABLE 3. Schematic Representation for the Cell Frequencies in a 2 X 2 Table 


es 


Failure Success. Total 

Group 1 a b a+b =n 
Group 2 c d c+d — 
Total ate b+d atb+c+td=N 


—M—————————eeeeeoOo OM OeóOAlee!Í 


The letters refer to the cell frequencies. Now, by some rather 
tedious algebra, it may be shown that under the null hypothesis 
formula (19) is identical with formula (21) below. Thus, 


oF CERES 
Smp: = N QN) (m1) 2) (21) 


Substituting in formula (21), we obtain 


arri rm, [um yu 
AMEN (80) (50) 80) 120,000 ^ «01-0 


4. ONE-TAILED AND TWO-TAILED TESTS OF SIGNIFICANCE 


We may obtain z by expressing the observed difference be- 
tween the 2 proportions .24 as a deviation from the mean of the 
sampling distribution and then dividing this value by the standard 


78 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


error of the sampling distribution .10. But the mean of the 

sampling distribution has been specified by hypothesis to be equal 

to zero. Hence, 

Li —p)-m (81—.600—0 24 
C p;—p. -10 0 


z = 2.4 

By reference to the table of the normal curve, w 
-0082 of the total area will fall to the right of an Wi Mie bus 
Hence, the probability of a z as large as or larger than 2.4 may bs 
taken as .0082. This is not, however, the probability desired for 
evaluating the hypothesis that m is zero; for, under this hypothesis 
sometimes p; will be greater than p», and sometimes P» will ths 
greater than pı. The value .0082 is for the probability of the 
difference in the direction stated only, i.e., pı greater than pa. 
This probability, in other words, involves only one tail, the positive 
tail, of the normal distribution curve and is often referred to as 
that for a one-tailed lest of significance. 

The probability corresponding to an absolute value of z 
positive and negative, will be given by the sum of the area in the 
two tails marked off by ordinates erected at 2.4 and — 2.4. Since 
the curve is perfectly symmetrical, this will be .0082 + .0082 = 
0164. The value .0164 is the probability for obtaining an abso- 
lute value of z equal to 2.4, and this is the desired probability for 
testing the hypothesis that the 2 samples are from a common 
population. 

In most of the experiments described in previous chapters, 
the experimenter was interested only in the probability of the 
frequency of correct choices exceeding a given value, and this was 
properly a one-tailed test. A significantly small number of correct, 
choices would have no positive bearing, for example, upon the claim 
of the farmer. All frequencies of choices less than the significant 
number of correct choices would constitute negative evidence 
concerning the farmer's ability to locate water with the aid of the 
divining rod. Results in only one direction would relate to his 
claim of success or enable the experimenter to reject the null 


hypothesis. 
Experiments involving 2 or more samples and the hypothesis 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS 79 


that the samples are from a common population are logically 
evaluated by a two-tailed test of significance. A deviation in either 
direction from the specified population mean difference of zero in 
the present problem, for example, would bear upon the tenability 
of the hypothesis tested. 

For convenience and in terms of growing practice in statistics, 
we may refer to the significance standard of .05 for a one-tailed 
test, involving a positive or negative value of z but not both, as the 
5 per cent point. By this we shall mean that point beyond which 
5 per cent of the total area falls in each tail of the distribution. 
When we speak of a 5 per cent level of significance, we shall have 
reference to an absolute value of z, and this will be established so 
that 2.5 per cent of the total area falls in each tail! Thus, for the 
normal distribution a z of 1.65, approximately, will be significant 
at the 5 per cent point. A value of 1.96 will be significant at the 
5 per cent level. 

In the present problem, the probability of obtaining an 
absolute value of z equal to 2.4 is .0164 and in terms of a signifi- 
cance level of .05 must be regarded as significant. The hypothesis 
that the 2 samples are from a common population is rejected, be- 
cause random sampling from a common population would yield a 
2 as large as or larger than this with a relative frequency less than 
.05. Fact (the data or observed difference between p, and p») 
and hypothesis (random sampling from a common population) are 
not in accord. If we accept the data, we reject the hypothesis at 
the significance level of .0164. . . 

With rejection of the null hypothesis, the experimenter may 
infer that the experimental condition to which Group 1 was sub- 
jected was influential in bringing about a larger proportion of suc- 
cessful responses than the experimental condition to which Group 2 
Was subjected. This inference, in turn, may be brought into rela- 
tion with the particular theory which is being investigated. 


he probability values of x? for 1 degree of 
lity values of the two-tailed z test. That 
; i to halve the probabilit: 
Ww: i ous chapter, it was necessary p ility 
koe led. X table in order to make it comparable to the probability 
Obtained from the table of the normal curve corresponding to a plus value of 


1 It may be observed that tl 
freedom correspond to the probabil 


2 only, 


80 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


5. THE x TEST FOR THE DIFFERENCE BETWEEN UNCORRELATED 
PROPORTIONS 


As might be expected from earlier discussions, x^ could be 
used to evaluate the outcomes of the experiment described as well 
as the normal distribution. The assumptions involved are much 
the same. 'The hypothesis to be tested is that the 2 samples are 
random samples from a common population. On the basis of this 
hypothesis, the frequencies of successes and failures in the 2 
samples are pooled to arrive at common estimates of the population 
values. Thus, we have by formula (20) 


242-418. ,. 8-12 


- os oa gel To 
P= aaa «4 sey gy 


If each sample is from a common population, the expected 
frequency of successes in each sample would be given by multiply- 
ing .75 by the sample sizes. For Group 1, for example, the ex- 
pected frequency of successes is nip = 37.5, and for Group 2 the 
expected frequency of successes is nop = 22.5. Similarly, the 
expected frequency of failures in the 2 samples would be given by 
multiplying .25 by the sample sizes. The expected frequencies 
are given in Table 4. Subtracting the entries in Table 4 from the 


TABLE 4. Expected Frequencies for the Data of Table 2 Assuming Random 
Sampling from a Common Population 


—_——— 


Failure Success Total 
Group 1 12.5 37.5 50 
Group 2 7.5 22.5 30 
Total 20.0 60.0 80 


eee 


corresponding entries in Table 2, we obtain the values of (o — e) 
shown in Table 5. 

It may be observed that the squares of the discrepancies 
(o — e)? are all equal, and this will always be true when we have 
but 2 groups to compare, when the frequencies within each group 
are pooled to arrive at a common estimate, and when we have but 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS 81 


TABLE 5. Devialions of the Observed Frequencies of T'able 2 from the 
Expected Frequencies of Table 4 


Failure Success 
Group 1 —45 4.5 
Group 2 4.5 —4.5 


a single degree of freedom. This fact enables us to write 
d 
x =(0- e) Le (22) 


The values of 1/e may be found from a table of reciprocals (Table 
II, Appendix). Having found these reciprocals, we sum them 
and then multiply the sum by (o — e). This will give us 
2 — (4,5)? (.08000 + .02667 + -13333 + .04444) 


x 
(20.25) (.28444) 


= 5:76 

The value of x? 5.76 is equal to the square of the value of z 
2.4, for the same problem. This relationship between z and x? 
holds, as was pointed out before, only when x^ is based upon 1 
degree of freedom. This must be, then, the number of degrees 
of freedom available for x? in the present problem. How do we 
arrive at this number? The 4 cell frequencies of the 2 X 2 table 
may be thought of as contributing a total of 4 degrees of freedom. 
But one restriction placed upon the data involves N, the sum of 
the 2 samples’ sizes. If N is to remain constant, then 1 degree 
of freedom is lost here. Furthermore, if N is given, only one of 
the row totals is free to vary, and this results in the loss of a second 
degree of freedom. It will also be true that only one of the column 
totals will be free to vary, and this results in the loss of a third 
degree of freedom. From the 4 degrees of freedom contributed 
by the cell frequencies, 3 are lost and 1 is left for evaluating x”. 

The argument above amounts to saying that only 1 of the 
cell frequencies in a 2 X 2 table may be filled in arbitrarily, if the 
are to remain the same. In general, inanr Xc 
f degrees of freedom available may be deter- 


marginal totals 
table, the number o 


82 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


mined quite easily by multiplying the number of rows minus 1 by 
the number of columns minus 1. In the present problem we have 
(r —1) -1)= 2-0)2-1)-1 

By reference to the table of x? we find that the obtained value 
of 5.76 for 1 degree of freedom has a probability between .02 and 
.01. Linear interpolation would give approximately .017 as the 
probability. Errors of rounding and the method of interpolation 
account for the slight discrepancy between this value and the value 
of .0164 obtained from the table of the normal curve for the two- 
tailed test. No conclusions concerning significance “would be - 
changed. The hypothesis: of random sampling from a common 
population would still be rejected. 


6. THE CORRECTION FOR CONTINUITY 


In the case of earlier experiments where the outcomes were 
expressed in terms of discrete frequencies, we found it was neces- 
sary to make a correction in order to evaluate the data by means 
of the table of the normal curve or the table of x? because of the 
continuous nature of these distributions. The correction for 
continuity is necessary here also. We could have demonstrated 
it in connection with the problem discussed, but we did not do so 
because it was felt that it would be better to gain an understanding 
of the nature of the test of significance applied to problems of this 
sort without confusing the picture by digressing with an explana- 
tion of the correction for continuity at that time. 

Making the correction in 2 X 2tables is very easy and should 
always be made. The correction will not change the relationship 
between x? and z, but will have the effect of reducing the value 
of both in the same relative way, so that x? will still be equal 
to 2. 

If the data are to be evaluated by means of the table of the 
normal curve, the correction involves only a minor change in the 
numerator of the z ratio. The observed value of pı was based 
upon a frequency of 42 which, when divided by the sample size 
50 gave .84. Similarly, the value of p? was based upon a frequency 
of 18 which, when divided by the sample size 30, gave .60. Correct- 
ing these values for continuity involves adding .5 to the smaller of 
the 2 frequencies and subtracting .5 from the larger of the 9 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS 83 
EE ee Ee 2 SAL N eii o EN 


and then recalculating the values of p; and ps. Thus, 


p = a = 8300 and p= DA = .6167 

The difference between p; and pe will now be .8300 — .6167 = 
.2133. Dividing this difference by the standard error of the 
difference, which we have previously found to be equal to .10, 
we obtain .2133/.10 = 2.133 for the value of z corrected for con- 
tinuity. By reference to the table of the normal curve, we find 
this value to be significant, and the hypothesis of random sampling 
from a common population would still be rejected. 

In the case of x”, the correction is also made quite simply. 
It merely involves reducing the absolute value of (o — e) by .5 
in formula (22). Doing this, we would now have 


x! = (48 —.5)? (28444) = (4)? (28444) = 4.55 


It can be proved that the value of x? corrected for continuity will 
be equal to the value of 2? corrected for continuity. In the present 
example, the corrected value of x? is 4.55, and the corrected value 
ofzis 2.133. Thus, we have (2.133)? = 4.55. 

The correction'for continuity, in the problem discussed, 
resulted in no change with respect to our attitude toward the 
Statistical significance of the outcome of ihe experiment. In 
Critical cases, however, failure to apply the correction may result 
in the rejection of the hypothesis tested, whereas the corrected 
value of z or x? may not be significant. j , 

In using the normal distribution to obtain an approximation 
of the probabilities of the binomial, we had a rule that np (or ng 
if q was smaller than p) should equal at least 5.0. A similar rule 
may be stated for the use of the table of the normal curve and the 
table of x? in evaluating the data of a 2 X 2 table. The methods 
described here will give fairly good approximation to the probabili- 
ties obtained by more exact methods, as long as no expected cell 
frequency is less than 5.0. When this is not the case, the probabili- 
ties obtained from the table of the normal curve and the table of 
X! may be seriously in error. Under this circumstance, the more 
exact methods described by Yates (1934b) and Fisher (1936) 


Should be used. 


84 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Fisher (1936) has shown, for example, that the probability of 
any observed set of frequencies in a 2 X 2 table will be given by 
the product of the factorials of the 4 marginal totals divided by 
the product of the factorials of the grand total and the 4 cell 
entries. Using the schematic representation of Table 3, the proba- 
bility will be given by 

(gines ane or 1 
N! ) Gaz) 


Thus, the probability of obtaining the set of frequencies given in 


Table 2 will be 
(= 20! coron ( 1 
80! 8! 42! 12! 18! 


The desired probability, however, involves not only the set of 
cell frequencies as recorded in Table 2, but all other possible sets 
which are more extreme with the marginal totals remaining the same, 
Thus, we also need the probabilities for the following sets of 


frequencies:? 
ff 43 6 44 5 45 4 46 
13 17 14 16 15 15 16 14 


17 13 18 12 19 11 20 10 


Direct caleulation shows that the sum of the values of P for the 
set of frequencies as given in Table 2 and the other 8 possible 
more extreme sets as indicated above is equal to .017. 

We found previously that z corrected for continuity was 
2.133. The value of P corresponding to a one-tailed test of sig- 
nificanee with an obtained value of z equal to 2.133 is .016, and 
this value of P may be compared with that obtained by the direct 


? These caleulations are facilitated by tables of the logarithms of fac- 
torials, Pearson (1914) gives the logarithms for factorials up to 1,000. The 
Mathematical Tables from Handbook of Chemistry and Physics, previously re- 
ferred to (p. 46), gives the logarithms for factorials up to 100. 2 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS — 85 


method which is .017. For the two-tailed test of significance, as 
made by means of x?, these probabilities must be doubled. Thus, 
the value of P corresponding to the obtained value of x? (corrected 
for continuity) of 4.55 will be (2)(016) = .032 and the corre- 
sponding value of P obtained by the direct method will be (2)(.017) 
= .034. The agreement between the probabilities, in this instance, 
is very satisfactory, but this will not always be the case, particularly 
if the theoretical cell frequencies are small. 

In practice however, the experimenter’s interest may not be 
primarily in the exact value of P for any given hypothesis, but 
rather in whether or not this hypothesis is to be regarded with 
suspicion. For this purpose, it does not seem likely that he will 
be led astray seriously if the x? test is applied to 2 X 2 tables with 
theoretical cell frequencies as small as 5, as long as the obtained 
value of x? corrected for continuity is sufficiently large or suf- 
ficiently small to indicate that a conclusion concerning significance 
will not be changed if the test is made by means of the direct 
method, If the value of P obtained by means of the x? test is of 
borderline significance (P approximately 05), then the direct 
method may be applied and the decision to accept or reject the 
hypothesis may be made upon the basis of the value of P thus 
obtained. It is important to remember that the value of P 
obtained by the direct method must be doubled if it is to cor- 


respond to the tabled probabilities for x’. 


7. ANOTHER METHOD FOR CALCULATING x 


Another method for the calculation of x for a 2 X 2 table 
may be described. This method is convenient for use with a 
calculating machine or with logarithms. The formula can be 
given in terms of the schematic representation of the cell frequencies 


Shown in Table 3. Thus, 


N (bc — ad)? 
X 7 (y 4 by(c + d)(a - c) + d) (23) 


e by subtracting N/2 from 


The correction for continuity is mad ; 
(bc — ad) before squaring. 


the absolute value of the difference 


86 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Thus, x? corrected for continuity is 


x (Ie - ad| - i) 


X 7 a FoF d)(a F e)» d) 
Applying formula (24) to the data of Table 2, we obtain 
2 
80 (sos — 144| — i) 
* = (0609090 ^» 


which is identical with the value corrected for continuity previously 
obtained by means of formula (22). 


(24) 


8. CORRELATED PROPORTIONS 


In the problem just described, we have dealt with independent 
samples. The independence of the 2 groups was assured by the 
method of randomly assigning the subjects to the 2 experimental 
conditions to be compared. If the samples are independent, then 
we have no reason to believe that the values of p; and ps» are 
correlated. That is to say, with successive repetitions of the 
experiment with comparable samples, the values of p; and p; 
will not be expected to vary together. 

There are other conditions where this may not be the case. 
We might, for example, test the same group of subjects under 2 
different sets of experimental conditions. We might then wish to 
compare the difference between the proportion of successes, where 
a success is the appearance of some response under investigation 
in the 2 experimental conditions. Another common example is 
when a group of subjects is given a test and we wish to compare 
the proportion passing or responding to an item in some particular 
fashion with the proportion in the same group passing another 
item. 

Let us suppose that we have 200 subjects and that they are 
tested for the appearance of a particular discriminatory response 
before (Test I) and after (Test II) experiencing a set of experi- 
mental conditions. We will thus have a pair of observations 
for each subject. If the discriminatory response is made, we shall 


Ld 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS $87 


call this a success (S), and if it is not, we shall call this a failure 
(F). Then there are 2 ways in which a subject may respond 
at the time of the first test (Sı or F1), and either of these 2 ways 
may be followed by either of 2 ways at the time of the second test 
(Ss and Fs), so that we have (2)(2) or 4 possible patterns of 
response: 

SiF5 

S18» 

FS. 

FF. 


Let us assume that in our particular experiment, the fre- 
quencies corresponding to the above patterns are 20, 80, 40, and 
60, respectively. These frequencies are shown in Table 6. The 
proportion of successes at the time of the first test will be the sum 


TABLE 6. Frequency of Failures and Successes of 200 Subjects under Test 
Condition I and under Test Condition 1I 


Test II 
Failure Success "Total 
Success 20 80 100 
Test I Failure 60 40 100 
80 120 200 


Total 


of the frequencies corresponding to patterns SF» and S,Se 
divided by the total number of responses. Thus, the proportion 
of successes will be equal to 100/200 — .50 for the first test and 


120/200 — .60 for the second test. 
Pı = .50 and pz = -60 differ significantly. 


We wish to know whether 


9. STANDARD ERROR OF THE DIFFERENCE BETWEEN 
CORRELATED PROPORTIONS 
Under the conditions described, the standard error of the 
difference between the 2 proportions will be given by 


= Magd F ep] — iran, (25) 


Opi. p 
where ppp, is the correlation coefficient between the values of the 


88 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


two p's which might be expected with successive repetitions of 
the experiment. We, of course, do not know what the value of 
p is, any more than we know what the values of op,” and cj? are. 
It is known, however, that the correlation can be estimated from 
the 2 X 2 table. The product moment correlation coefficient for 
the 2 X 2 table is often called the fourfold point coefficient or the 
phi coefficient. Let us assign the value of 0 to a failure and the 
value of 1 toasuccess. Then the calculation of the phi coefficient 
can be expressed in terms of the schematic representation of the 
frequencies in Table 3. Thus, 


bc — ad 
V(a + b) (c + d)(a 4- c) (b +a) 


Under the hypothesis of random sampling from a common 
population with a common value øp” replacing op? and op,” in 
formula (25), and with r taken as the fourfold point coefficient of 
formula (26), McNemar (1947) has shown that the desired 
standard error of the difference between the 2 proportions can be 
obtained in a very simple fashion. Under the conditions stated, 
assuming the null hypothesis, formula (25) reduces to 


d+ 


Tp-p: — N? 


(26) 


(27) 


Substituting in the above formula, we obtain 


40 + 20 
tom = A oy 7 V/.0015 = .0387 


Then z will be given by the difference between the 2 proportions 
pı and pa divided by the standard error of the difference. Thus, 


From the table of the normal curve, we find that a plus value 
of z equal to 2.58 or larger, under the conditions stated, may be 
expected with a relative frequency given by the proportion of the 
total area falling beyond an ordinate erected at this point. This 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS 89 


is equal to .0049, and the probability corresponding to an absolute 
value of z equal to or greater than 2.58 will be the area in both 
tails of the curve or .0049 + .0049 = .0098. The obtained value 
of z is thus significant at the 1 per cent level, and, abiding by our 
standards, the null hypothesis will be rejected. We reject the 
hypothesis that the data are from a common population and 
conclude that p; and p» differ significantly. The reason for this, 
we may assume, lies in the nature of the experimental set of 
conditions to which the individuals in our experiment were sub- 
jected before the time of the second testing. The experimental 
conditions, we believe, did something to the subjects which 
resulted in a significantly greater frequency of successes at the 
time of the second test. 


10. THE x TEST FOR CORRELATED PROPORTIONS 


Let us now examine the relationship between the value of z 
just obtained and the value of x? for the same problem. The 
number of degrees of freedom will still be equal to 1, and under 
this condition we may expect x! to be equal to the value of z^ as 
before. The value of x”, obtained under the same conditions and 
assumptions as z, may be expressed in terms of the schematic 
representation of the cell frequencies, as MeNemar (1947) has 


Shown. Thus, 


(x atoy 

2 t d — ay 

2 (pə — Pı) AN Ws oo 2 

Pa P= dta TIs d-d-u (28) 
N? N? 


Substituting the frequencies from Table 6 in formula (28) above, 


we obtain 
2. 40 — 20)? = 6.67 
X = "40 + 20 


squared (2.58)? is equal to 6.66, the 
his and the value of x? 6.67 being the 
Ived in the calculations. 


It may also be noted that z 
Slight discrepancy between 
result of rounding errors Mvo 


90 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


11. CORRECTING FOR CONTINUITY: CORRELATED 
PROPORTIONS 


The values of z and x” obtained above should be corrected 
for continuity? The method of making the correction, if z is 
to be calculated, involves only a minor change in the numerator 
of the z ratio. We add .5 to the frequency corresponding to the 
smaller value of p and subtract .5 from the frequency correspond- 
ing to the larger value of p. Thus, the corrected values of p, 
and p» will be 


. 1000-7. 5 — " 120.0 — .5 
m= 930 ^ .5025 and ps = Cam .5975 


The difference between the corrected values of p; and po, 
.5975 — .5025 — .0950, is found most easily by taking 


"e = 
€— € (29) 


Substituting in the above formula, we obtain 


_ (40 - 20| - 1) 

Pı 2 = ^. 900 

as before. Thus, z corrected for continuity is equal to .095/.0387 
— 2.45, and the probability of obtaining this value may be evalu- 


ated by means of the table of the normal curve. 
x corrected for continuity will be equal to 2” corrected for 


continuity, and thus, 


= .0950 


(ld —a| - 1}? 
N? (ld — a| 2 1)? 
2. = 
= d+a d+a (30) 
N? 


Substituting in formula (30) above, we obtain 
(|40 — 20| — 1)? 
40 + 20 


? = 6.02 


3 Edwards (1948). 


be 


| 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS 91 


the discrepancy between the value of x? and 2 again being the 
result of errors of rounding in the calculations. 


12. THE INFLUENCE OF CORRELATION ON THE TEST OF SIG- 
NIFICANCE 


In testing the significance of the difference between fre- 
quencies or proportions when the same group of subjects is in- 
volved, the formulas for the standard error of the difference or 
x", which take into account the presence of possible correlation, 
should always be used. The failure to take the correlation into 
consideration, if it is positive, may result in not rejecting the 
hypothesis tested at the level of significance agreed upon, because 
an inflated error term (the standard error of the difference) is 
used in the test of significance. The presence of positive correla- 
tion will always serve to reduce the standard error of the difference, 
as an examination of formula (25) will show. Under this cireum- 
stance, taking the correlation into account may result in a signifi- 


cant value of z, whereas failure to consider the correlation element 


may not. : A o". 
An*equally serious error will be made if the correlation is 


negative, and this is not taken into account in the test of signifi- 
cance. Under this circumstance, ignoring the correlation will 
result in an underestimation of the standard error compared with 
the value obtained when the correlation is considered. Ignoring 
the correlation may thus result in a spuriously large value of z. 


13. EXAMPLES 

1. An experimental group consists of 60 subjects and a 
control group consists of 40 subjects. The experimental group 
is given a training period and then both groups are tested during 
a critical test period. A particular response 1s under investigation. 
The appearance of this response is recorded as a success and the 
Nonappearance as a failure. In the experimental group 24 
Successes are recorded, and in the control group 8 Successes are 
recorded. (a) Under the hypothesis of random sampling from a 
common population, test the significance of the differences between 
the proportion of successes in the 2 groups, making the necessary 


92 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


correction for continuity and using the table of the normal curve 
for your solution. (b) Make the same test without correcting 
for continuity and see whether your conclusion would be any 
different. Note that the manner in which the hypothesis is stated 
indicates that a two-tailed test is being made in each in- 
stance. 

2. In a general psychology class of 100 students, 80 succeed 
in passing an item in a test. In another similar class of 100 stu- 
dents, 60 pass the same item. Under the hypothesis of random 
sampling from a common population, test the significance of the 
difference between the proportions passing the item in the 2 
classes. Make the necessary correction for continuity and use the 
table of the normal curve in testing the hypothesis. Note that 
this is also a two-tailed test. 

3. In a group of 50 maladjusted students, 32 give an agree 


response to an item in a personality schedule. A group of normal . 


controls is also tested, and it is found that 35 give an agree response 
and 55 give a disagree response. Test the hypothesis that the 
2 samples are from a common population by means of the xi 
test. Make the necessary correction for continuity. Note that 
the probability attaching to the obtained value of x? is that which 
would be obtained in making a two-tailed normal-curve test and 
is therefore consistent with the hypothesis being tested. 

4. In a study by Hellman (1914) it is reported that of 20 
breast-fed youngsters, 4 had normal teeth and 16 showed mal- 
occlusion. Of 22 bottle-fed youngsters, 1 had normal teeth and 
the other 21 showed malocclusion. In testing the hypothesis 
that these 2 groups are random samples from a common population, 
Yates (1934b) shows that the direct calculation of the probability 
yields a P of .287. We may test the same hypothesis by using 
the x? test and determine how well the probability obtained by 
this method approximates the probability obtained directly. Do 
not forget to make the correction for continuity before computing 
x°. Carry along sufficient decimal places so that errors of round- 
ing will not be too great. The probability attaching to the ob- 
tained value of x? may be found by taking the square root of x! 
to obtain z and then entering the table of the normal curve with 
the value of z. The probability desired will be double the proba- 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS 93 


bility read from the normal curve table as the hypothesis involves 
à two-tailed test. 

5. In a factory where a group of 84 men and a group of 52 
women are doing exactly the same work, it is found that over a 
period of time 12 of the men and 5 of the women have accidents. 
Test the hypothesis that the relative frequency of accidents is 
the same for both sexes. Correct for continuity and use x? in 
making the test. 

6. In a class of 80 students, it is found that 48 succeed in 
passing an item on a test. In comparing the proportion passing 
this item with the proportion of the class passing another item, 
it is found that 42 out of the 60 passing the second item also 
passed the first item, i.e., were included in the 48 passing the 
first item. Since we are testing the same group of subjects, 
correlation may be expected between the 2 proportions. Test 
the hypothesis that the 2 samples are from a common population. 
You will find that x? corrected for continuity is easier to calculate 
than z. 
7. Fifty-five subjects are given 2 shades of blue which 
differ only slightly with respect to saturation. They are asked 
to select the shade with the greater saturation, and 36 of the 
subjects make the correct selection. The same subjects are then 
retested with 2 shades of red which differ only slightly with 
respect to saturation. On this test it is found that 31 make the 
correct selection and that 24 of the 31 are also included in the 36 
who made the correct selection in the test with blue. Calculate 
X^, taking into account the correlation and making the necessary 
Correction for continuity. = 

8. A sample of 500 adults in an opinion poll shows that 300 
are in agreement with a given issue and 200 in disagreement. In 
the sample we also have 370 who agree with a second issue and 
Of the 300 who agree to the first issue, 250 


130 who disagree. read 
ee late x”, taking into account the 


also agree with the second. Calculate ; 
Correlation and correcting for continuity. 

9. The following data consist of the frequency of responses 
of the same group of subjects to 2 test items. Eighty-two of the 
132 subjects pass Item 1 and the remainder fail. „Seventy-two 
of the 132 subjects pass Item 2 and the remainder fail. There are 


94 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


62 subjects who passed both items. Test the hypothesis that the 
samples are from a common population by means of x’, taking 
into account the correlation and making the necessary correction 
for continuity. 

10. Merritt and Fowler (1948) report an experiment in 
which the procedure was as follows: “stamped, self-addressed, and 
sealed letters of two types were ‘lost’ by depositing them promi- 
nently but discreetly on the sidewalks of various cities in the East 
and Midwest. Type A contained only a trivial message, while 
Type B contained, besides a message, a lead slug of the dimensions 
of a fifty-cent piece. The accompanying message indicated that 
the lead disk, as such, was of value to the addressee. Care was 
taken to drop the letters in locations sufficiently removed from 
one another to preclude the possibility of any one person finding 
more than one of the letters. All were put down in clear weather 
so that the envelopes would not become soiled and hence lose their 
appearance of value. Tests were made by night and day in both 
business and residential districts" (pp. 90-91). Thirty-three 
letters of Type A were dropped, and of these, 28 were returned 
by the person picking them up. One-hundred and fifty-eight 
letters of Type B were dropped and of these, 86 were returned. 
Calculate x”, making the necessary correction for continuity, 
and interpret your results. 

11. The “Zeigarnik effect," which is concerned with the rel- 
ative degree of recall of interrupted and completed tasks, has 
been studied by many psychologists. Tasks are presented to the 
subject and he is allowed to complete half of them and is inter- 
rupted on the other half. After the experimental session, the 
subject is asked to recall the tasks upon which he has worked. A 
measure frequently used in such studies is the ratio (RI/RC) of 
the number of interrupted tasks recalled to the number of com- 
pleted tasks recalled. Lewis (1944) and Lewis and Franklin 
(1944) tested a group of 12 subjects under the usual conditions 
and found the median value of the RI/RC ratio to be .67. Thus, 
6 subjects had values below .67, and 6 subjects had values above 
this figure. A second group of 14 subjects was tested under a 
cooperative work condition in which a co-worker was permitted 
to complete the tasks which had been interrupted for the subject. 


ES 


e AN 


DIFFERENCE BETWEEN 2 FREQUENCIES OR PROPORTIONS 95 


Under this test condition, 12 subjects had a RI/RC ratio greater 
than .67, and 2 had values less than .67. From the data given, 
a2 x 2 table may be set up in which the cell entries will be 6, 6, 2, 
and 12. (a) Find the value of x? corrected for continuity. (b) 
Use the table of the normal curve to find the value of P corre- 
sponding to x?. 

12. Lewis and Burke (1949) in their excellent, discussion 
of the x? test in psychological research recommend that the test 
not be applied to the 2 X 2 table when theoretical cell frequencies 
of less than 10 are involved. "Thus, they believe that the applica- 
tion of the x? test to the data of Example 11 is unwarranted since 
all 4 of the expected cell frequencies are less than 10 and 2 are 
somewhat smaller than 5. It has been suggested in this chapter 
that we shall not be seriously in error in applying the x? test to 
the 2 x 2 table as long as we correct for continuity and as long 
as no theoretical cell frequency is less than 5. It was also sug- 
gested that when theoretical frequencies close to 5 were involved 
and when the obtained value of x” was of borderline significance, it 
would be better to calculate the value of P by the direct, method 
described in the chapter. Although the value of x! as obtained 
in Example 11 leaves little doubt about the conclusion concerning 
Significance, it is worthwhile as an exercise to caleulate the value 
of P by the direct method. You will find that the probabilities 
obtained by the two methods are in close agreement. — 

13. Kuenne (1946) studied transposition behavior in 2 groups 


of children who differ with respect to age. Group 1 consisted of 
18 children ranging in age from approximately 34 to 46 months. 
ge from approximately 


Group 2 consisted of 26 children ranging in a; 
61 to 63 months. In the critical test trials, 3 of the children in 
Group 1 showed transposition behavior and 15 did not. In 
Group 2, the number showing transposition behavior in the critical 
test trials was 20, while 6 failed to meet the criterion. From the 
data given it is possible to set up & 2 x 2 table in which the cell 
entries will be 3, 15, 20, and 6. (a) Find the value of x^ corrected 
for continuity. (b) Use the table of the normal curve to find the 
value of P corresponding to x”. 


.. 14. The obtained value o. 
nificant and there is no reason to be 


f x? in Example 13 is highly sig- 
lieve that the conclusion con- 


96 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


cerning significance would be changed by the value of P obtained 
by the direct method. To familiarize yourself with the direct 
method of obtaining the probability for the 2 x 2 table, however, 
use it with the data of Example 13. You will again find that, 
although 2 of the expected cell frequencies are less than 10, the 
agreement between the probabilities obtained by the two methods 
is very satisfactory. 


ne 


E 


i re! 


CHAPTER 6 


The Application of the x? Distribution in 
Research Problems Involving More Than 
1 Degree of Freedom 


1. INTRODUCTION 


The methods of the previous chapters are applicable to 
studying the distribution of discrete variables that can be classi- 
fied in 1 of 2 ways. The farmer's response in the experimental 
Situation, for example, could be classified as correct or incorrect, 
a success or a failure. Similarly, these methods may be used to 
Study the appearance or nonappearance of a particular response, 
choice, or behavior pattern in a given situation. We are now 
ready, hewever, to extend the methods described to research 
problems in which discrete variables may be classified in 1 of 3 
or more ways. 

Such problems are common enough in research. A manu- 
facturer is interested in the distribution of choices of à random 
sample of a defined population to 3 types of packages in which he 
may put his product on the market. Which iype of package 
will be selected most frequently? Are there significant differences 
in the distribution of choices? In an industrial plant, records 
may be available of the frequency of accidents by hours, by days, 
and by months. Are there significant differences present in the 
frequencies during the various hours of the working day? Can 
we assume that the frequencies during the various months of the 
year do not differ significantly? 

In a breeding experiment, a cross between 2 plants results 
in 352 seedlings. According to genetic theory, the seedlings 
should segregate into 4 types in the ratio of 9:3:3:1. The observed 
frequencies for the 4 types are 200, 72, 60, and 20, respectively. 

97 


98 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Are the data in accord with theory? Are the deviations from 
expectancy sufficiently great to render the theory untenable? 


2. A STUDY OF PREFERENCES 


To illustrate the application of the x? distribution to prob- 
lems such as those described above, let us take a very simple case. 
A random sample of 120 subjects is shown 3 shades of blue. Each 
subject is asked to express his preference for 1 of the 3 shades. 
'The distribution of preferences is as follows: 


Shade 1 Shade 2 Shade 3 
60 35 25 


What sort of hypothesis may we be interested in testing here? 

Not knowing anything else about the distribution of prefer- 
ences, we may specify by hypothesis that the distribution will be 
according to chance. This hypothesis specifies that we should 
expect an equal number of choices for each of the 3 shades and 
that such differences as are observed are the result of random 
sampling. The null hypothesis which we desire to test, therefore, 
is that there is no difference between the preferential values of the 
3 shades of blue. Under this hypothesis, the 120 choices would 
be expected to be distributed with 40 for each of the 3 shades. 
'Then, 


- 2 e 2 e 2 
"- (60 — 40) pee 40) æ 40) 


40 40 40 
= 10.000 + -625 + 5.625 
= 16.25 


The only restriction placed upon the data in this problem is 
that the sum of the frequencies must equal N, the total number of 
observations. Hence, only 2 of the frequencies are free to vary, 
for the third can be obtained by subtraction of the other 2 from 
N. We thus have 2 degrees of freedom for evaluating the ob- 
tained x? of 16.25. In general, in problems of this sort, the number 


1 The theoretical proportions selecting each of the 3 shades are specified 
by an a priori hypothesis; they are not estimated or obtained from the data 
in any way. 


Vase. 


x? DISTRIBUTION IN RESEARCH PROBLEMS 99 


of degrees of freedom available will be equal to the number of 


categories minus 1.? 


By reference to the table of x’, we find that for 2 degrees of 
freedoin a value of 9.210 or larger would occur as a result of 
random sampling, under the conditions specified, 1 time in 100. 
The probability of obtaining a value as large as 16.25 is therefore 
less than .01, and the obtained x? may be regarded as significant. 
The hypothesis of equal preference would be rejected. The data 
indicate significant deviations from expectancy in terms of the 
hypothesis tested, and Shade 1 seems to be the preferred choice. 


3. A STUDY OF INDUSTRIAL ACCIDENTS 


Let us take another example, similar to the one above, but 
in another area of application. In an industrial plant a record is 
kept of the accidents which occur. The record shows not only 
the number of accidents which each employee has, but the serial 
order of his accidents and the time at which each occurred. In 
Table 7 we have given the frequency distribution of first accidents 
for 144 employees according to the hours in which they occurred. 


TABLE 7. Frequency Distribution of First Accidents per Hour in an Industrial 
Plant for a Given Period of Time 


Total 
Hours s og So do 2 2" 9 | 
os apa d9 u n a Mm 


Without prior knowledge concerning the expected distribution of 
accidents, we may set up the hypothesis of a uniform distribution. 
We do not know, in advance, let us say, of any reason why dif- 
ferences should be observed in the frequency of accidents from 
hour to hour of the working day. In the absence of any such 
knowledge, we specify by hypothesis that an equal number of 
accidents should occur during each of the various hours and that 

2 rection for continuity was a) lied in 
this uibus ay be observed tho case with all values of x? based Es more 
than 1 derre of freedom. Yates (1934b) has pointed out les with more than 
1 degree of freedom the correction becomes of minor importance. 


100  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


any differences that are observed are the result of chance devia- 
tions from expectancy. Under the hypothesis Stated, the ex- 
pected frequency of accidents for each of the hours would be the 
total number of accidents divided by the number of hours. Thus, 
the expected frequency for each of the hours would be 144/8 = 18. 
Then, : 


2. (25 - 18)? , (16 — 18? , (10 — 18)? 
x es 18 s 18 t E dena 
(22 — 18)? 
+ 18 


= 8.89 


Again the only restriction placed upon the data is that the 
sum of the frequencies must equal 144. This means that 7 of the 
8 cell entries are free to vary and that the remaining entry may ~+- 
be determined by subtraction of the other 7 from N. 'Thus, we 
have 7 degrees of freedom for evaluating the obtained x? of 8.89. 
By reference to the table of x?, we find that a value 14.067 has 
a probability of .05 for 7 degrees of freedom. The probability 
of our obtained value of 8.89 is somewhere between .30 and .20 
and is thus not significant. The data at hand, we must conclude, 
do not offer any evidence against the hypothesis of a uniform 
distribution. 
It is apparent, however, from an examination of the distribu- 
tion, that accidents seem to occur more frequently during the 
first hour, the beginning of the afternoon, and the hours just m 
before the beginning of a break in the working period, i.e., the i 
hours 8, 11, 1, and 4. Let us determine whether or not these 
particular hours differ significantly from the remaining hours. 
The sum of the frequencies for the hours 8, 11, 1, and 4 is 
equal to 87, and for the hours 9, 10, 2, and 3 the total number of 
‘accidents is 57. Four hours enter into both of these totals, and 
under the hypothesis of no difference between the 2 periods, the 
expected frequency in each of the 2 periods will be 144/2 = 72. 
Then x? will be given by 


(87 — 72)? (57 — 72}? 
Bi asc à 
% 2 z2 


6.25 


a 


E 


X? DISTRIBUTION IN RESEARCH PROBLEMS 101 


But since we have but 1 degree of freedom for the x? obtained 
here, we may apply the correction for continuity and find 


2 


. (86.5 — 72.0)? + (57.5 — 72.0)? 


72 72 Sst 


For 1 degree of freedom, we find that the P for x? equal to 
5.84 is less than .05, and hence the obtained value is significant. 
The hypothesis of an equal number of accidents in the 2 periods 
is not tenable. The data deviate significantly from expectancy 
under the hypothesis of no difference between the 2 periods. 

Now the calculation of x? here is straightforward and clear, 
but the source of the hypothesis and the arrangement of the data 
are not in accordance with good experimental procedures. It 
may be observed that we have selected the hours with the largest 
frequencies and compared these with the ones with the smallest 
frequencies. We have selected but 1 out of 70 such comparisons 
that might be made. The particular one which we have selected 
happens to be the one which would give the greatest deviations 
from expectancy and, thus, the largest possible value of x^ which 
could be obtained from comparisons of the sort described for the 
data at hand. Even when the hypothesis being tested was true, 
we would expect, by random sampling, to obtain about 3 or 4 
significant values of x? (P less than .05) in 70 such tests of the 


hypothesis. . d 
If this particular comparison had not been planned prior 


to the examination of the data, it would be much sounder to test 
it with a new set of data obtained, perhaps, from another plant 
or over a new period of time. On the other hand, psychological 
theory might have indicated that the period of beginning to work 
is a “warming-up” period and one in which accidents are likely to 
oceur; similarly, that the period just before a break in the work 
is one in which employees are anticipating the rest period and are 
More careless than during the other hours. If theory did so 
indicate these conditions, then the particular hours selected for 
comparison would be logically justified. The comparison would 
be one, however, that was planned and incorporated into the 
Problem prior to the examination of the accident records. 


102  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


4. A STUDY OF VOCATIONAL ADVISEMENT 


In a follow-up study of veterans receiving vocational advise- 
ment, 3 separate samples of 100 cases each were taken from the ° 
files of a college guidance center? Sample 1 was drawn from 
the records of July and August, 1944; Sample 2 was taken from the 
records for January, 1945; Sample 3 was taken during June, 1945. 
'The data given in Table 8 show the disposition of the cases re- 
ferred by the Veterans Administration to the guidance center. 


TABLE 8. Disposition of Cases in 3 Separate Samples by a Guidance Center 


Training Declared Not to Advisement No 
Program Be in “Need” Not Follow-Up Total 
Planned of Training Completed Data 
Sample 1 69 8 12 11 100 
Sample 2 67 2 4 27 100 
Sample 3 70 E 4 25 100 
Total 206 11 20 63 300 


Let us test the hypothesis that these samples are random 
samples from a common population. Under this hypothesis, the 
pooled frequencies for all samples will provide the estimates of 
the proportions to be expected in each of the 4 categories. Thus, 
206/300 — .6867 will be the proportion expected in the category 
"training program planned." The proportion expected in the 
category "declared not to be in ‘need’ of training" will be given 
by 11/300 = .0367. The proportion expected in the third cate- 
gory will be 20/300 — .0667, and 63/300 — .2100 is the propor- 
tion expected in the fourth category. 

Since the sample size is the same in all 3 samples, i.e., each 
of the n’s is equal to 100, if we multiply the proportions obtained 
above by 100, we shall have the expected frequencies in each of 
the categories for each of the samples. These expected frequencies 
are given in Table 9. We may note that the expected frequencies 
are the same for each of the 3 samples within the 4 categories, 
Consequently, each of the (o — e)? values within a given category 


3 Long and Hill (1947). 


\ 


E 


x? DISTRIBUTION IN RESEARCH PROBLEMS 103 


TABLE 9. Expected Frequencies for the Cell Entries of Table 8 Assuming 
Random Sampling from a. Common Population 


Training Declared Not to Advisement No 

Program Be in “Need” Not Follow-Up 

Planned of Training Completed Data 
Sample 1 68.67 3.67 6.67 21.00 
Sample 2 68.67 3.67 6.67 21.00 
Sample 3 68.67 3.67 6.67 21.00 


will be divided by the same frequency. This fact enables us to 
sum the values of (o — e)? within each category and then to 
multiply by the reciprocal of the expected frequency for the cate- 


TABLE 10, Squared Deviations of the Observed Frequencies of Table 8 from 
the Expected Frequencies of Table 9 
BRENNEN a ee 


Training Declared Not to Advisement No 
Program Be in “Need” Not Follow-Up 
Planned of Training Completed Data 
(o — e)? (o —e)? (o — e)? (o — 6)? 
a 
Sample 1 .1089 18.7489 28.4089 100.0000 
Sample 2 2.7889 2.7889 7.1289 36.0000 
Sample 3 1.7689 7.1289 7.1289 16.0000 
28.6667 42.6667 152.0000 


4.6667 


Total 
eS 


gory in obtaining the various components of x. i The values of 
(o — e)? are given in Table 10. Then x? will be given by 


1 1 
x2 = zz (4.6667) + 5 57 (28.0067) + cs (42.6667) 


1 
+ z (152.00) 
(.0146) (4.6667) + (2725) (28.6667) + (1499) (42.6667) 


+ (.0476) (152.00) 
.0681 + 7.8117 + 6.3957 + 7.2352 


21.5107 


Il 


I 


Il 


104  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


'The number of degrees of freedom for this problem will be 
given by (r — 1)(c — 1) = 6, and from the table of X? we find 
that a value of 16.812 is significant with a probability of .01. 
The obtained value of 21.5107 will thus have a probability of less 
than .01 and may be regarded as significant. The hypothesis 
that the samples have been randomly drawn from a common 
population will be rejected. 

The large components making up the obtained value of x? 
are apparently the result of the deviant frequencies in Sample 1 
in terms of categories 2, 3, and 4. More of the individuals in this 
particular sample were declared not in “need” of training, a larger 
frequency was observed where the advisement was not complete, 
and there were fewer cases with no follow-up data. One possi- 
bility is that if follow-up data were available for more of the cases 
in Samples 2 and 3, then the discrepancies between Sample 1 and 
the 2 other samples would not be so great. If such data were 
available, the cases might be distributed in categories 2 and 3, 
making the samples more comparable. One would like to know 
the reason for the increase in frequencies in the no follow-up data 
category in the second and third samples, but the test of signifi- 
cance, of course, cannot answer this question. The test of sig- 
nificance does indicate that the distributions of frequencies in the 
various samples deviate significantly from expectancy in terms of 
random sampling from a common population. 


5. THE j X 2 TABLE 


Nine sections of an introductory course are offered at a 
university. A section is offered at every class hour, from 8 in 
the morning through 4 in the afternoon. The size of the sections 
is limited; the maximum number in any section is around 50 
students. Since every student at the university is given an aca- 
demie aptitude test, it is possible to classify the students in the 
various sections as being "below the college average" or “above 
the college average" on the test. The frequencies of students 
above and below average for each section are given in Table 11. 

Let us set up the hypothesis that the 9 sections of Table 11 
are from a common population. With no prior knowledge about 
the proportion above and below average in the population, our 


x? DISTRIBUTION IN RESEARCH PROBLEMS 105 


TABLE ll. Frequency of Students Below Average and Above Average on a 
College Aptitude Test in 9 Sections of an Introductory Course 


g 


a) (2) (3) (4) (5) (6) 
(na) (ns) (n) (na)? (na)?/n 
Section No. Below No. Above Total in 
Hour Average Average Section 
8 18 32 50 324 6.480 
9 20 25 45 400 8.889 
10 22 20 42 484 11.524 
11 19 19 38 361 9.500 
12 14 22 36 196 5.444 
1 21 19 40 441 11.025 
2 22 21 43 484 11.256 
3 16 20 36 256 7.111 
a 4 10 _20 _30 100 3.333 
Sum 162 198 360 74.562 
(Na) (No) (VY) 


PS 


best estimates would be obtained by pooling the frequencies of 
all 9 sections. We would then have 162/360 = .45 as the ex- 
pected proportion classified as below average and 198/360 = .55 
as the expected proportion classified above average. The number 
of cases in each section multiplied by these 2 proportions would 
give us the expected frequency below average and the expected 
frequency above average for the various sections. We could then 
proceed to find the value of x? by one of the methods with which 


we are already familiar. 
'There is, however, 
aj X 2 table which redu 


ò 


a simplified method for calculating x2 in 
ices the labor involved to a minimum.* 


We take the column of frequencies of Table 11 with the smaller 
total. In the present instance, this is column 2, which has a 
total of 162 for the frequency of those below average. We now 
Square each of the frequencies in column 2 to obtain the values 
in column 5. In column 6 we have divided the squares of the 
frequencies by the corresponding number of cases in the various 
sections. We find the sums indicated at the bottom of Table 11 


n one devised by Brandt and Snedecor and 


Arm: is based upo 
This method is basse pP 205-207. See also Mather (1947). 


^ is described in Snedecor (19460), pP. 


106  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


and substitute in the following formula to obtain 


x- Food [= "s LS Cy) 


Making the substitutions from Table 11 in formula (81) above, 
we obtain 


g- [ (360)? ][r E Dg 


(162) (198) 360 
129,600 26,244 
"E 74.502 — et 
| - rd L 962 — "960 | 


(4.04) (74.562 — 72.900) 
(4.04) (1.662) 
= 6.714 


The number of degrees of freedom available for the x? of 
6.714, according to a previous rule for problems of this sort, will 
be equal to the number of rows minus 1 multiplied by the number 
of columns minus 1 or 8.° By reference to the table of x”, we 
find that the obtained value of 6.714 is not significant (P is almost 
.50), and the hypothesis tested must be regarded as tenable. 
There is no evidence that the various sections are not from a 
common population with a proportion of .45 below average and 
5.5 above average. 


6. PLANNING THE COMPARISONS TO BE MADE 


It may be the belief of some instructors that if students are 
allowed to select the hours for the sections, those who take the 
early-morning and late-afternoon sections may perhaps differ 
from those who take sections at other hours. x? could be used 
to test the null hypothesis here also. It should be emphasized 
again, however, that theory should guide the choice of sections 
to be so tested and not an examination of the data. Obviously, 


5]f the restriction placed upon the data is that the marginal totals 
must remain the same, only 8 of the 18 cell entries of frequencies are free to 
vary; once these have been entered, the remaining 10 entries can be obtained 
by subtraction from the marginals. 


x? DISTRIBUTION IN RESEARCH PROBLEMS 107 


if we examined the data of Table 11 and then selected the sections 
with the larger frequencies of students above average on the 
academic aptitude test and compared these sections with those 
with the larger frequencies below average, a maximum value of 
x? for the particular data at hand would be obtained. If the 
choice of sections to be compared is determined by the data at 
hand, the test of significance should be based upon new observa- 
tions. On the other hand, if theory or logical considerations lead 
to the choice of sections to be compared, this comparison will 
have been incorporated into the design of the experiment and will 
have been planned before the data have been examined. 

For the sake of illustrating the test, let us assume that some 
consideration other than the examination of the data made a test 
of the difference between the combined early-morning and late- 
afternoon sections and all other sections desirable. Combining 
the 8 and the 4 o'clock sections, we have the data of Table 12. 


TABLE 12. Frequency of Students Below Average and Above Average in the 8 
o'clock and the 4 o'clock Sections Compared to the Frequency Below 
Average and Above Average in All Other 
Sections on the Aptitude 


Test 
No. Below No. Above 
Average Average Total 
8 and 4 o'clock sections 28 B. a 
All other sections 134 146 280 
162 198 360 


Total 
a 
By means of formula (24) we then obtain 


360)? 
(360) (i (52) (134) — (28) (146) | mE - 
= — — VE EET ee zm i 
ae (80) (280) (162) (198) 
For 1 degree of freedom, the obtained value of 3.65 for x? is not 
quite equal to the tabled value of 3.841 for which the value of 
P is .05. We would have to regard the hypothesis that the com- 
bined 8 and 4 o’clock sections do not differ significantly from all 


108 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


other sections as tenable in terms of the customary significance 
standard. 


7. AN EXPERIMENT INVOLVING A TEST OF TECHNIQUE 


Sound engineers are well aware of the fact that rooms differ 
with respect to acoustics. In some rooms a speaker may be 
heard quite clearly by everyone in the room; in other rooms 
there may be "silent" spots in which the speaker may not be heard 
at all. Horton (1948) has attempted to devise a technique for 
measuring audibility of various rooms. His method consists of 
recording the voice of a speaker who reads a list of words. The 
recording is then played in the room being tested. The subjects 
seated in the room are given a printed multiple-choice test with 
4 words similar in sound to each word that has been recorded. 
They are supposed to select, from each set of 4 words, the 1 
word which they think they have just heard over the loudspeaker. 
For example, suppose the first recorded word is "father." The 
recorded voice says “Item 1." There is a short pause, and then 
the word "father" is heard by the subjects. They look at their 
printed test under Item 1 and find the words: “feather,” “further,” 
"farther," and "father." The task of the subjects is to pick the 
word they have just heard from the set of 4. 

One problem which was of concern was the extent to which 
the results of a set of observations obtained by this technique 
could be reproduced. Various uncontrolled factors could enter 
into the experimental situation, and since the experimental tech- 
nique was relatively new, not much was yet known about the 
kinds of controls necessary. A test of the technique was provided 
by repeating the experiment in the same room and under assumed 
comparable conditions with different groups of subjects. 

Let us take the frequencies for 1 word only. We shall count 
the response of a subject as “correct” if he selects the word heard 
and "incorrect" for any 1 of the 3 other possibilities selected. 
One room investigated seated 80 subjects, and another room 
seated 100 subjects. The frequencies of correct and incorrect 
choices for 4 groups of subjects tested in Room 1 and for 4 groups 
of subjects tested in Room 2 are given in Table 13. 

What hypotheses are we interested in testing with respect 


[rota 


A 


F 


X? DISTRIBUTION IN RESEARCH PROBLEMS 109 


TABLE 13. Frequency of Incorrect and Correct Responses to a Test Item for 4 
Samples of 80 Subjects Each Tested in Room 1 and for 4 
Samples of 100 Subjects Each Tested in Room 2 
— I dole i EEE 


Group No. Incorrect No. Correct Total 

1 23 57 80 

2 30 50 80 

Room 3 26 54 80 
4 20 60 E: 

Total 99 221 320 

T 44 56 100 

2 52 48 100 

Room 2 3 48 52 100 
4 40 60 100 

Total 184 216 400 


pica E LL MEE LLLA ZEE CI 


to the data of Table 13? We could test the hypothesis that the 
subjects respond to the word by chance. Since 4 alternatives 
have been provided, if we assume that each alternative is equally 
likely to be selected, the probability of selecting the correct word 
by chance will be 14 and the probability of selecting a wrong 
word will be 34. Under this hypothesis, the expected frequency 
ach group tested in Room 1 will be np = 
cted frequency of incorrect choices 
will be 60. Four separate values of x? could now be calculated, 
1 for each group, and each value based upon 1 degree of freedom.® 

The smallest of the 4 values of x? that may be calculated 
will be given by the data of Group 2, for the observed frequencies 
for this group deviate least from expectancy. If the value of x? 
caleulated for this group is significant, then we know that the 
3 other values will be significant also. Correcting for continuity, 


we have for this group 
— 20.0)? 
(49.9 — A 


(30.5 — 60.0)? , (49.5 
mm A 


of correct responses for e 
(80)(.25) — 20, and the expe 


— 58.016 


The value of 58.016 is highly significant for 1 degree of freedom, 
in any way from the data, and 


b was not determined 
The value of p as they are when the hypothesis 


the marginal totals 99 and 221 are not fixed 
is determined from pooling the frequencies. 


110 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


and it is obvious that the frequency of correct choices for the 
various groups tested in Room 1 exceeds chance expectancy. 
A similar test for the groups in Room 2 would also result in the 
rejection of the chance hypothesis. 

But one of the major problems which we set out to investigate 
was the extent to which the results obtained by the technique 
could be reproduced. Let us see if the hypothesis that the 4 
groups tested in Room 1 are random samples from a common 
population is tenable. The theoretical value of p will now be 
determined from the pooled frequencies of correct responses for 
the 4 groups and will be equal to 221/320 — .691. Since the 
data are in the form of a j X 2 table, the caleulation of x? may 
be made most easily by means of formula (31). 


TABLE 14. Data for Calculating x? for the 4 Samples of 80 Subjects Each Tested 
in Room 1, Assuming Random Sampling from a Common Population 


(na) (ns) (n) (na)? (na)*/n 
No. No. - 
Incortect ^ Correct "Total 
Group 1 23 57 80 529 6.6125 
Group 2 30 50 80 900 11.2500 
Group 3 26 54 80 676 8.4500 
Group 4 20 60 80 400 5.0000 
Sum 99 221 320 31.3125 
(Na) (No) (N) 


The preliminary calculations are shown in Table 14. Then, 
substituting in the formula, we obtain * 


2 _ [ (820)? ] . (99)? 
x= [n [31.3125 = m] 
(4.6803) (.6844) 

= 3.203 


The x? of 3.203 is not significant for the 3 degrees of freedom 
available, and the variation within the groups tested in Room 1 
may be assumed to be no greater than that expected in random 


ae 


xX? DISTRIBUTION IN RESEARCH PROBLEMS 111 


sampling from a common population or from homogeneous ma- 
terial. This finding provides evidence that the experimenter is 
able to reproduce the. results obtained with his technique in Room 
l. The best estimate of the proportion of correct choices to be 
expeeted for the particular word under investigation in Room 1 is 
.691, and the 4 groups tested show nothing more than random 
variation about this value. 

A similar test for the consistency of the technique may be 
made with the data for the 4 groups tested in Room 2. For this 
room, the theoretical proportion of correct choices will be given 
by the pooled frequency 216 divided by 400. This is equal to .54. 
The value of x2 may be found by means of formula (31) and 
the preliminary calculations are given in Table 15. Substituting 


4 Samples of 100 Subjects Each Tested 


TABLE 15. Data for Calculating x? for the 
m a Common Population 


in Room 2, Assuming Random Sampling fro 


(no) > (n) (n) (na)? (na)?/n . 
No. No. 
Incorrect Correct Total 
Group i 44 56 100 1936 19.36 
Group 2 52 48 ` 100 2704 27.04 
Group 3 48 52 100 2304 23.04 
Group 4 40 60 100 1600 16.00 
Sum 184 216 400 85.44 


in the formula, we obtain 


(400)? EN s 
v= Pen [ss 400 - ] 


(4.0258) .8) 
= 8.221 

For 3 degrees of freedom, the value of 3.221 is not significant, 
and again the experimenter may have confidence that he is able 
to reproduce the results obtained with his technique in Room 2. 
The 4 groups tested in this room show no more than random 
variation to be expected in sampling from a common population 
in which the estimated proportion of correct choices is .54. 


i] 


112 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Satisfied that the experimental technique is such that stable 
results are obtained within each of the 2 rooms, the experimenter 
proceeds now to test the difference between the 2 rooms.” This 
test is made by pooling the frequencies within each room to 
obtain the data shown in Table 16. The hypothesis to be tested 


TABLE 16.—Frequency of Correct and Incorrect Responses to a Test Item for the 
820 Subjects Tested in Room 1 and the 400 Subjects Tested in Room 2 


No. No. 
Incorrect Correct, atal 
Room 1 99 221 320 
Room 2 18: 216 400 
Total 283 437 720 


is that the 2 samples, based upon 320 and 400 observations, have 
been randomly drawn from a common population. The value 
of x”, remembering the correction for continuity, may be obtained 
by means of formula (22). Substituting 


x! = (26.8 — .5)?(.0079 + .0064 + .0052 + .0041) 
= (691.69) (.0236) 
= 16.32 
The observed value of x”, 16.32, is highly significant for 
1 degree of freedom, and the hypothesis tested must be regarded 
as untenable. If it may be assumed that the technique used by 


the investigator measures the audibility level of the 2 rooms, 
Room 1 is obviously significantly superior to Room 2. 


TIf the hypotheses of random sampling from a common population 
within each of the 2 rooms had not been tenable, this comparison would 
not be justified. It is legitimate to combine the data from several samples to 
test the hypothesis of random sampling from a common population, and if this 
hypothesis is tenable, it is legitimate to treat the pooled data as s 
sample. But if the hypothesis of random sampling from a eommon 
tion is rejected, the pooled data obviously cannot be taken as a single 


a single 
popula- 


a sa 
representative of a common population. Thus, if one or the other or [Xs 
of the values of x? obtained in the test of technique had been Significant, the 

] 


comparison between the 2 rooms would not be meaningful. 
8 This test is identical with the test of the hypothesis that t| 


B: he fre 
of correct response is independent of the room classification. queney 


x^ DISTRIBUTION IN RESEARCH PROBLEMS 113 


The value of providing a test of experimental technique 
has been well summarized by Snedecor (1946a): 


When novel experimental methods are used, the investigator must 
determine, among other things, whether he can reproduce his results; 
that is, whether he has adequate control over the conditions under which 
the experiment is performed.... If results cannot be verified under 
controlled conditions assumed to be identical, then it is idle to try the 
effect of changing these conditions—one cannot know whether differences 
(or likenesses) in the results are to be charged to the controlled situation 


or to those unknown causes that elude control? 


8. x FOR MORE THAN 30 DEGREES OF FREEDOM 
The table of x? (Table IV, Appendix) provides entries for 
degrees of freedom equal to 30 or less. For higher values of de- 
grees of freedom, we may take 
z = V2x? —V2(df) —1 (32) 
The value of z obtained above is distributed approximately 
in a normal fashion with zero mean and unit standard deviation.!? 
Hence, the table of the normal curve may be entered with the 


value of z obtained by means of formula (32). 

Suppose, for example, that we obtain a value of x? from a 
10 x 5 table. Here we have, according to rule, (9)(4) = 36 
degrees of freedom. If the obtained value of x” is equal to 54.5, 


then 
= V (Q2)04.5) — /(2)30) — 1 
— 10.440 — 8.426 
= 2014 


m 
I 


From the table of the normal curve, we find that the area in the 
right tail cut off at an ordinate erected at z equal to 1.96 is .025, 
and similarly that the area cut off in the left tail at an ordinate of 
~1.96 is .025. Thus, if the value of z obtained by menns of 
formula (32) equals or exceeds 1.96, the obtained value of x? may 
be regarded as significant, for the value of P attaching to the 


? Snedecor (19462), P- 203. 
10 Fisher (1936), p- 63- 


114 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


obtained x? will be equal to or less than .05. In the example 
cited, the obtained value of z exceeds 1.96, and hence the hypothe- 
sis being tested would be regarded as untenable. 


9. EXAMPLES 


1. Records were kept of the number of students who left a 
university auditorium through each of the 3 main exits. For a 
sample of 795 students the counts were as follows: 


Exit 1 Exit 2 Exit 3 
No. students using 245 200 350 


What hypothesis might be tested here? Compute the value 
of x? and determine whether or not it is significant. 

2. A reasoning problem which involved clamping together 
2 sticks so that the length was just sufficient to wedge the up- 
right between the floor and the ceiling of an experimental room 
was used in an investigation by Maier (1945). The subjects 
were instructed to construct a hat rack from the materials sup- 
plied, and the solution was as described above, the projection of 
the clamp from the boards providing the necessary hook for 
hanging up a hat or coat. Men and women were used as subjects, 
and they were tested under 3 different experimental conditions— 
the conditions involving different clues as to the solution of the 
problem. The data reported here are the totals for all 3 conditions. 


Failed to Solve Succeeded in Solving Total 
Men 13 26 39 
Women 26 10 36 


Test the independence of ability to solve this problem and 
sex classification by means of x”, making the necessary correction 
for continuity. 

3. In an opinion poll conducted by Edwards (1947) for the 
University of Washington Medical School, 2 samples of the adult 
population were drawn. One of these was an areal sample of 
the sort used in the survey work of the Census Bureau. The other 
sample was a quota control sample of the type used by Gallup 
Both samples were asked the same set of questions. The cit: ; 
sampled was Seattle. The data given below are for one of i: 


X? DISTRIBUTION IN RESEARCH PROBLEMS 115 


questions used in the survey. Test the hypothesis that these 
samples (areal and quota) are random samples from à common 
population, making the correction for continuity. The question 
asked was: Do you own a pet? 


No Yes Total 
Quota 192 136 328 
Areal 136 110 246 


4. Another question asked in the same poll was: Would you 
approve or disapprove of using live dogs and cats for research 
upon the problem of cancer? The responses of members of the 


2 samples were as follows: 


Approve Disapprove Don't Know Total 
Quota 260 58 10 328 
Areal 199 333 14 246 


Test the hypothesis that these 2 samples are random samples 


from a common population. . 
5. In the same survey, subjects were asked to name the 


figure of 1 in 8 which was widely used in radio programs concerning 
the relative frequency of deaths from cancer. This was a multiple- 
choice question in which subjects were given a card with a number 
of alternatives. For this problem, we have merely recorded 
whether the response was correct or incorrect or don’t know. 


Correct Incorrect Don’t Know Total 
57 328 

Quota 185 86 
Areal 152 65 29 246 


Can we assume that the 2 samples are equally well informed 


with respect to this question? 
6. It is of interest to determine whether the 2 samples differed 


With respect to certain background characteristics. For example, 
the distribution of ages, in intervals of 10, for the 2 samples was 


as follows: 
Age 20 30 40 50 60 70 andover Total 
32 37 328 
Quota 65 90 57 47 
Areal 54 63 63 35 18 13 246 


Test the hypothesis that the samples are from a common 


population. 


116  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


7. Another background characteristic concerns level of 
education. The distributions for the 2 samples are as follows: 


Quota Areal Total 

No school and grammar incomplete 31 14 45 
Grammar school complete 35 37 72 
High school incomplete 73 40 113 
High school complete 81 84 165 
College incomplete 69 43 112 
College complete 39 28 67 
328 246 574 


Test the hypothesis that the 2 samples are from a common 
population. 

8. Rosenzweig (1943) tested the recall of subjects for finished 
and unfinished tasks when they worked on the tasks under differing 
sets of instructions. An "informal" group worked under the as- 
sumption that the investigator was interested in studying work 
methods and that the ability of the subjects was not under investi- 
gation. A “formal” group worked on the same tasks under the 
impression that the problems were a kind of intelligence test. 
On some of the tasks, both groups of subjects were interrupted, 
and on other tasks they were allowed to work until the problem 
was completed. At the end of the experiment both groups of 
subjects were asked to recall the tasks upon which they had 
worked. The data for recall are as follows: 


No. Subjects No. Subjects No. Subjects 
Recalling Recalling with no 
Preponderance of Preponderance of Preponderant 
Finished Tasks Unfinished Tasks Tendency 


Informal group 7 19 4 
Formal group 17 8 5 


Compute x? for the data. 

9. In an experiment concerning the influence of a particular 
drug upon some physiological response, the drug is tested at two 
levels of concentration. The drug is administered by injection, 
and the experimenter is not sure of his technique, i.e., if comparable 
groups were tested a second time whether he would obtain the 
same or comparable results. In order to determine whether or 
not he can reproduce his results, 160 subjects are divided at 
random into 8 groups of 20 subjects each. Four of these groups 


F 


X? DISTRIBUTION IN RESEARCH PROBLEMS 117 


are picked at random from the 8 and tested with the drug at one 
level of concentration. The 4 other groups are tested with the 
second level of concentration. The results are as follows: 


Groups Reaction Absent Reaction Present 
3} 10 10 
12 8 
12 
5 
14 
12 
15 
14 


First level 
1 


Second level 


H- OQ» Oo Oto 
ANNA am 


(a) Can we assume that groups tested at the first level are 
from a common population, i.e., is the variation greater than 
would be expected from sampling homogeneous material? (b) 
Can we make similar assumptions concerning the groups tested at 
the second level of concentration? (c) If the groups tested at 
the first level are homogeneous, then they might be combined 
and the totals tested against the totals of the groups at the second 
level. (d) Would the test described in (c) be valid if the assump- 
tion of sampling from a common population were rejected? What 
nt x? for either level indicate? 
le, we may sometimes have more 
than 30 degrees of freedom available for evaluating xi. For 
situations such as this, Fisher (1936) has shown that 


2x! - v ()(df) - 1 


may be used as à normal deviate with unit standard error. This 
means that the quantity above may be evaluated by reference to 
the table of the normal curve. Suppose, for example, that we 
had a x? of 52.02 for 32 degrees of freedom. Is this value signifi- 


sent level? 
cant at the 5 per cen had a x? of 81.92 available and 36 de- 


11. Suppose that we 
grees of iui con Is this significant at the 5 per cent level? 

12. Kendall and Smith (1939) have described the tests they 
applied to their tables of random numbers. All the m in 
t n ere run off by an operator using an electrical 

he published tables w One of the tests applied to 


device constructed for the purpose. : : 
the numbers drawn was the frequency test which consisted of 


would a significa 
10. In a large 7 X € tab 


118 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


counting the frequencies of the digits from 0 to 9. Various sets 
of numbers were rejected, including this one: 
Digit f 
0 1,083 
865 
1,053 
884 
1,057 
1,007 
1,081 
997 
1,025 
948 


co -ogotu- 


(a) On the assumption of random selection each of the 
digits would have an expected frequency of 1,000. Calculate 
x! to determine whether the departure from expectancy is sig- 
nificant. (b) How many degrees of freedom will be available 
for evaluating x”? 

13. Hartman (1939) tested men and women with various 
solutions of phenyl-thio-carbamide. The solutions were numbered 
in terms of strength from 0 to 10, and the threshold was recorded as 
the concentration below which they first tasted the presence of 
phenyl-thio-carbamide. Since some subjects tasted the weakest 
solution 0, the threshold for these subjects was recorded as below 
0, giving rise to 12 classes. The frequency distributions of the 
thresholds for 290 men and 314 women were as below: 


Frequency 

Strength Men Women Total 
10 15 42 57 
9 35 52 87 
8 46 38 84 
7 31 30 61 
6 23 19 42 
5 13 17 30 
4 9 6 15 
3 7 5 12 
2 10 10 20 
$ 13 19 32 
0 25 33 58 

Below 0 63 43 106 


Find the value of x’. 


E 


x? DISTRIBUTION IN RESEARCH PROBLEMS 119 


14. Records were kept at a university medical clinie of 
students who had attacks of influenza. Some of these students 
had been given vaccinations against influenza and others had not. 
The students were also classified in terms of whether they had 
a severe attack or a minor attack of influenza. The data are 


as follows: 


Not Vaccinated Vaccinated Total 
Severe attack 82 40 122 
Minor attack 30 98 128 


Test the independence of the 2 classifications. 


CHAPTER 7 


Testing Hypotheses about 
Correlation. Coefficients 


1. INTRODUCTION 


One of the most frequently used statistics in psychological 
research is the product moment coefficient of correlation. The 
coefficient of correlation is a measure of association or the extent 
to which changes in one variable are accompanied by or are 
associated with changes in a second variable. The coefficient 
may be positive or negative in sign and ranges in size from — 1.00, 
through zero, to 4-1.00.! 

A positive correlation coefficient between 2 tests, for example, 
will be obtained when subjects who are above the mean on one of 
the tests also tend to be above the mean on the second test, whereas 
subjects who are below the mean on one of the tests also tend to 
be below the mean on the second. A negative correlation co- 
efficient, on the other hand, will be obtained when subjects who 
are below the mean on one test tend to be above the mean on the 
other test, whereas subjeets who are above the mean on the first 
test tend to be below the mean on the second.? 

Let one of the 2 variables for which the correlation co- 
efficient is to be computed be symbolized by X and the other 
variable by Y. Then, using r to designate the correlation co- 


efficient, we have 
a - 
VEP EY 


1 When the coefficient is positive, the sign is usually omitted. 

2 This discussion of correlation is obviously incomplete. It is assumed 
that the reader is familiar with the treatment of correlation as given in an ele- 
mentary statistics text. 


(33) 


120 


a 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 121 


2. THE SAMPLING DISTRIBUTION OF THE CORRELATION COEF- 
FICIENT 


Let us suppose that we have made measurements of 2 
variables X and Y for a group of N subjects, so that for each subject 
we have a value of X and a corresponding value of Y. For this 
group of N subjects we compute the correlation coefficient by 
means of formula (33) and find it to be.60. Now if an indefinitely 
large number of successive random samples of N subjects are 
drawn from the defined population and the value of r is computed 
for each sample, the distribution of these values will be the sampling 
distribution of r. If the various sample values of r were normally 
distributed about the population value, and if the standard error 
of the distribution was known, we could proceed to test various 
hypotheses in the manner already familiar by means of the table 


of the normal curve. Unfortunately this is not the case. 
If the sample size upon which the r’s are based is small, the 


sampling distribution of r will be decidedly skew if the value of 
the correlation in the population is moderately large. The degree 
of skewness will be a function of both the sample size and the 
correlation in the population. The smaller the sample size and 
the greater the correlation in the population (either positive or 
negative), the greater the degree of skewness in the sampling dis- 
tribution.2 As the value of N, upon which the correlation co- 
efficients are based, increases in size, the skewness tends to dis- 
appear, and the sampling distribution becomes more symmetrical. 
When the value of the correlation in the population is zero, the 
sampling distribution of r for small samples is symmetrical, but 


not’ quite normal. 


x is the limit placed upon the degree to 

3 One reason for the skewness 1s the limit pl c ee t 
which r E vary at one end of the scale when the population correlation is 
moderately large. For example, the values of r based upon small samples 


dr ulation in which the correlation is .85 can vary above this 
Dr ince r cannot exceed +1.00. At the same time 


to the ex! f only .15, $ 4 
the saine iier may fall below .85 as far as — L00. Since the sample 
values will occur less frequently the greater they depart from the popula- 

bout the population value of .85 with 


tion val e will be a clustering 2 1 
the tail of the distribution stretching out toward the negative end of the 


continuum. The sampling distribution, in this instance, will be negatively 
skewed. 


122 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


3. THE NORMAL CURVE TEST OF THE HYPOTHESIS OF ZERO 
CORRELATION 


If N is quite large and if the population correlation is not too 
high, the standard error of r is often taken as 
1p 
Or Nm 1 T (34) 
where p is the population correlation. The symbol c is used for 
the standard error here instead of s because the formula assumes 
that the population correlation is known exactly and is not esti- 
mated in any way from the sample data. Let us specify by hy- 
pothesis that the population correlation is zero, and let us assume 
that we have obtained a value of r equal to .60 which is based 
upon 11 pairs of observations. Jf the population correlation is 
zero, then the value of p will become zero in formula (34). Sub- 
stituting and solving for the standard error of r we obtain 


Now, when the population correlation is zero, the sampling 
distribution of r is symmetrical but not quite normal, as pointed 
out earlier. But let us assume, for the time being, that the dis- 
tribution is approximately normal about the mean, and we can 
later determine the degree of inaccuracy introduced by this assump- 
tion. Expressing the obtained value of r as a relative deviate by 
subtracting the mean of the sampling distribution, which we have 
specified as being equal to zero, and dividing by the standard 
error of r, we obtain 

.60 — .00 
wea C 1.898 
We know, by reference to the table of the normal distribution, 
that a z of 1.96 is required for significance at the 5 per cent level, 
and thus the value of 1.898 is not significant. The hypothesis of 
random sampling from a population with zero correlation would 
be regarded as tenable. 


= 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 123 


In order to get some idea of the inaccuracy introduced by the 
assumption of normality, let us solve for the value of r that would 
give a significant value of z, i.e., equal to 1.96. If 


fcu N 


3162 1 


would be regarded as significant, then a significant value of r 
would have to be equal to at least (1.96)(.8162) = .620. Before 
rejecting the hypothesis of random sampling from a population 
with zero correlation, the value of 7, based upon 11 pairs of obser- 
vations, must therefore be equal to at least .620, if the evaluation 
is made by means of the table of the normal curve and the signifi- 
cance level is taken as .05. The exact value, obtained by the 
method of the next section, is .602. So we see that the error 
involved in the use of formula (34) and the normal curve table is 
not too great in testing the hypothesis that the population cor- 
relation is zero, even when the sample size is as small as 11. If 
we wish, however, to test hypotheses about the population correlation 
other than zero, the skewness of the sampling distribution of r for 
small samples makes the use of this method extremely questionable 


and unreliable. , oe: 

Two solutions to the problem of the sampling distribution of 
r have been developed. Both involve the transformation of r 
to another statistic for which the sampling distribution is known 
exactly and for which tables are available. One of these trans- 
formations, 4, is limited to testing the hypothesis that the popu- 
lation correlation is zero and corrects for the slight inaccuracy 
introduced by the assumption of normality and the use of the 
table of the normal curve in evaluating this hypothesis. The 
other transformation, 2’, is more general and can be used to test 
a variety of hypotheses. 


T OF THE HYPOTHESIS OF ZERO CORRELATION 


QUNM he number of degrees 
The distribution of t depends upon t g 
of freedom available as did the distribution of xl. The table of 
t (Table V, Appendix) is thus à two-dimensional table and must 
be entered with both the number of degrees of freedom and the 


4. THE t TES 


124  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


observed value of t.* For this reason, the table of t is not as 
complete as the table of the normal distribution which is entered 
with only the value of z. The column headings of Table V show 
the proportion of the total area cut off in the two tails of the t 
distribution at ordinates erected at the entries in the body of the 
table for the number of degrees of freedom given at the extreme 
left. Thus, for 10 degrees of freedom, ordinates erected at plus 
and minus values of ¢ equal to 2.228 will cut off .05 of the total 
area. This means that .025 of the total area will fall to the right 
of the ordinate at 2.228 and .025 to the left of the ordinate at 
—2.228. We may thus say that for 10 degrees of freedom the 
probability of an absolute value of t equal to or greater than 2.228 
is.05. The probability of a plus t equal to or greater than 2.228 
is .025, and the probability of a negative value of t equal to or 
greater than —2.228 is .025. It is important to keep in mind that 
the fabled probabilities correspond to a two-tailed test of significance; 
they are levels of significance. If a one-tailed test of significance 
is desired, the tabled probability should be halved. 

Under the hypothesis that the population correlation is 
zero, then, 


ey Soe 35 
t — N -2 (35) 


is distributed in accordance with the values of Table V for degrees 
of freedom equal to N — 2.5 Let us suppose that we have 
obtained a correlation of .60 and that this is based upon 11 pairs 
of observations. Then, substituting in formula (34), we obtain 


.60 
t = -= V 11 — 2 = 2.95 
V1 — (.60)? 


The number of degrees of freedom available for evaluating 
this ¢ of 2.25 will be N — 2 or 9. By reference to the table of t, 
we find that for 9 degrees of freedom a value of 2.262 will be 
required for significance at the 5 per cent level. This is a two- 
tailed test and refers to the probability of obtaining an absolute 


* The t distribution is discussed in greater detail in Chap. 8. 
5 Fisher (1936), p. 196. 


^ 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 125 


value of t equal to or greater than 2.262. Since the obtained value 
of t 2.25 does not quite equal the tabled value 2.262 at the 5 per 
cent level, the hypothesis of random sampling from a zero corre- 
lated population would be regarded as tenable. 


5. TABLE OF SIGNIFICANT VALUES OF THE CORRELATION 
COEFFICIENT 


Now it should be obvious that by substituting the values of 
t from Table V which are significant at the .05 and .01 levels of 
significance in formula (34), and by substituting various values 
of N also, it would be possible to solve for the values of r which 
would be significant at these levels. Table VI (Appendix) gives 
these values, and by reference to this table one can readily deter- 
mine whether a given value of r based upon a given number:of 
degrees of freedom (N — 2) is sufficiently large to cause us to 
reject the hypothesis that the population correlation is zero. If 
the obtained value of r equals or exceeds the tabled value for the 
number of degrees of freedom involved, then the hypothesis would 
be rejected. Entering Table VI with the 9 degrees of freedom 
available for the r of .60 which we obtained above, we find that 
this value would have to be equal to .602 to be significant at the 
5 per el. 

idi Tt pipe observed from Table VI that small r's may be 
Statistically significant when based upon à large number of cases, 
whereas large values of r may not be significant when based upon 
a small number of cases. Thus, an r of .55 based upon 10 pairs 
of observations may be expected to occur quite frequently asa 
result of random sampling variation, even when the population 
For 10 observations giving 8 degrees of freedom, 
would have to be equal to .632 to be significant 
the other hand, if the sample value of r was 
ver 300 pairs of observations, the hypothesis 
ould be rejected, even if the sample value 


value is zero. 
the sample value of 7 
at the .05 level. On 
based upon slightly o 
of zero correlation W 


was as small as .113. > 
While Table VI is extremely convenient for testing the 


ation correlation is zero, it is of no value 


h: .hesi the popul 1 
SENS BUR i the population value. Nor 


in testing other hypotheses concerning 


126 . EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


can formula (34) which gives us £ be used for this purpose. We 
cannot, for example, test the hypothesis that a sample has been 
randomly drawn from a population in which the correlation is, 
let us say, .84, or some other specified value. 

Suppose that we had obtained an r of .45 with 42 pairs of 
observations. For 40 degrees of freedom the obtained value 
exceeds the value in Table VI and the hypothesis of zero correlation 
would be rejected. But would the hypothesis that the sample 
was randomly drawn from a population in which the correlation 
was .23 be rejected? Could the sample have been randomly 
drawn from a population in which the correlation is as high as 
.82? To answer these questions, we shall have to make use of 
the 2’ transformation. 


6. THE z/ TRANSFORMATION FOR THE CORRELATION COEF- 
FICIENT 


The value of z’ is obtained by 
z = VY [loge (1 +r) — loge (1 — r)] (36) 


where r is the observed value of the correlation coeffieient. In 
order to make the z’ values directly available without resort to 
a table of natural logarithms, values of r were substituted in 
formula (36), and the corresponding values of z’ were found. 
These values are given in Table VII (Appendix). It is possible 
to enter Table VII with a given value of r and to read directly 
the corresponding value of z’. For negative values of r the tabled 
z' value should be given a negative sign. 

Fisher (1921) has shown that the distribution of z’ is approxi- 
mately normal and, for all practical purposes, may be considered 
independent of the correlation in the population. Some indication 
of the extent to which the z” transformation normalizes the 
distribution of r for small samples may be gained from an examina- 
tion of Figures 3 and 4. The first drawing shows the distribution 
curve for values of r based upon 8 pairs of observations drawn from 
a population where the correlation is zero and also the distribution 
curve for 8 pairs of observations drawn from a population where 


ê Table VII was constructed by F. P. Kilpatrick and D. A. Buchanan. 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 127 


- Q 2 E 6 8 4.0 
VALUES OF r 


“40 <8 -6 “4 


Fic. 3.—Sampling distribution of the correlation coefficient for samples of 
8 pairs drawn from populations having correlations as indicated. 


p=.00 p=.80 


b .5 10 15 20 25 
VALUES OF z” 


Fra. 4.—The distribution curves for z’ for samples of 8 pairs of observations 
drawn from populations having the correlations indicated. 


—2.5 -2.0 -15 -10 —5 


the correlation is.80. The second drawing shows the distribution 
of z’ from the same populations and for the same number of pairs 


of observations. r ; 
To illustrate the use of the z transformation, let us suppose 


128 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


that we have obtained a correlation coefficient of .82 between the 
attitude test scores of mothers and daughters, and that we have 
20 pairs of observations. Is the hypothesis that the population 
value is equal to .85 tenable? The z' value corresponding to the 
observed r of .82 is found from Table VII to be equal to 1.157. 
The z’ value corresponding to the population value of .85 is 1.256. 
Under the hypothesis that the population correlation is .85, 2! 
will be normally distributed about the mean value equal to 1.256. 

Now Fisher (1921) has shown that the standard error of the 
distribution of z’ will be given by 
z4 
VN -3 
In the present problem, since N is equal to 20, we obtain as the 
standard error 


(87) 


gu = 


1 1 
„v = —— = — = 94 
i 4/20 —3 4.123 3 


Then, since z/ is taken as normally distributed, the test of 
the hypothesis is made by means of the table of the normal curve. 
Expressing the observed z' as a relative deviate, we have 


"=g LAST —1.25 =. 
ja ER 21407 135 Oe 


By reference to the table of the normal curve, we find that 
an absolute value of z equal to 1.96 will be required for significance 
at the 5 per cent level. Hence, the obtained value of z equal to 
—.407 offers no evidence against the hypothesis that the sample 
was randomly drawn from a population in which the correlation 
was .85. 


7. ESTABLISHING THE FIDUCIAL LIMITS 


In exactly the same manner as that described above, we 
might proceed to test other hypotheses about the population 
correlation. But instead of testing the indefinitely large number 
of possible hypotheses, it is more convenient to establish limits 
within which hypotheses about the population value will be 
regarded as tenable and outside of which any hypothesis would 


— 


» 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 129 


be considered untenable. These limits are called the fiducial 
limits’ of the parameter at a defined level of significance, and the 


. interval within the fiducial limits is called a fiducial interval or 


confidence interval. 

The fiducial limits of the parameter at the 5 per cent level 
will be established for a normal distribution at hypothetical values 
of the population which give rise to z values of 1.96 and —1.96, 
for we know that any hypothesis which results in our obtaining 
either of these values or values larger than these will be rejected 
(P will equal .05 or less). On the other hand, any hypothesis 
which gives rise to a value of z less than these will be regarded 
as tenable (P will be greater than .05). Thus, whenever 


, P 
kat 


, s! 
2 —* > 196 or whenever z —1.96 


oz zt 
the hypothesis concerning 2’ will be rejected. From this we may 
easily set up equations which will determine the lower fiducial 
limit (z,) and the upper fiducial limit (22°) as follows: 


ay! = 2! — (1.96)(o2") and zo! = 2! + (1.96)(o2") (38) 


For the case at hand, where we have obtained a value of r 
equal to .82 with N equal to 20 and, hence, oz equal to .243, we 
may substitute in (38) above and solve for 2° and zy. Thus, 


a = 1357 — (1086) (243) = LIS? = 476 = 681 


and 
zo! = 1.157 + (1.96) (.243) = 1.157 + .476 = 1.633 


These fiducial limits, expressed in terms of 2’, may now be trans- 
lated by means of Table VII (Appendix) into the corresponding 
values of r. For the 21’ value of 681 we find that the correspond- 
ing r is approximately .59, and for the Ze value of 1.633, we find 
that the r is approximately 93. Thus, any hypothesis that the 
population value of the correlation falls within the limits of .59 
and .93 would be considered tenable. Any hypothesis that the 


7 Ysduei from the Latin fiducia indicating trust or confidence. 
In Phu ipee n indicates à standard of reference, for example, a 
fiducial line o cial point. 
€ the eq z P esas “equal to or greater than.” 


130  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


population value is equal to or less than .59 would be rejected 
as would any hypothesis that the population value is equal to or 
greater than .93. The 2 values .59 and .93 establish the fiducial 
limits of the parameter at the 5 per cent level of significance. 

It may be observed that the upper fiducial limit and the 
lower fiducial limit, expressed in terms of r, are not equally spaced 
from the observed value of r. This is a result of the skewed 
distribution of r for samples as small as the one at hand drawn 
from a moderately correlated population. At the same time it 
may be observed that the fiducial limits expressed in terms of 2’ 
values are equally spaced from the observed value of z’, a result 
of the normalizing influence of the z’ transformation. 

It is also worth noting that the sample value of .82 is not a 
very reliable estimate of the population parameter, for the evidence 
at hand indicates that any hypothesis that the population value 
falls within the limits of .59 and .93 would be considered tenable. 
To illustrate the greater normality of the distribution of r when 
the sample size is increased, and also that the sample value 
becomes a more reliable estimate of the population value, let us 
suppose that we had obtained the same value of r, namely, .82, 
with a sample of 403 cases instead of 20. The standard error 
of z/ will now be 


1 1 
-——— = = 05 
"UU Ma -3 20 


Solving for the fiducial limits by means of formula (38), we 
now obtain 
2’ = 1.157 — (1.96)(.05) = 1.157 — .098 = 1.059 
and 
Zj = 1.157 + (1.96)(.05) = 1.157 + .098 = 1.255 


The values of r corresponding to these values of Z’ are approxi- 
mately .78 and .85. Note that these values are more equally 
spaced on each side of the observed value of 7, .82, and that the 
fiducial limits are also much narrower than previously found for 
the sample of 20 cases. The range, for example, in terms of 


|} 


4 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 131 


values of r is now .07 compared with the range of .34 obtained 
with the sample of 20 cases. 

It is obvious that if we wish to estimate the population value 
of r with any degree of reliability, it would be necessary to have - 


a fairly large sample. 


8. TESTING THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN" 
2 CORRELATION COEFFICIENTS 


The z' transformation may also be used in testing the signifi- 
cance of the difference between 2 correlation coefficients derived 
from 2 independent sets of observations. Let us assume that an 
r of .73 is obtained between scores on a vocabulary test of psycho- 
logical terms and scores on a final examination in general psy- 
chology with a class of 100 students. With another class of 35 
students, we obtain an r between the same 2 variables of .60. We 
wish to know whether these 2 values of r differ significantly or 
whether we can assume that they are random samples from a 
common population. 

"Ther! lin corresponding to the r of .73 is .929. The z” 
corresponding to the r of .60 is .693. The difference between the 
Is this difference sufficiently great to cause 


2 values of z’ is .236. : 
examination? If we took an 


us to reject the hypothesis under n 
indefinitely large number of pairs of samples with the same n's as 


we have here from the same population and found the r's and 
corresponding values of z’ for each pair of samples, the differences 
between the pairs of z! values would form à sampling distribution. 
'The mean of this sampling distribution would be zero, and the 
differences between the z' values would be normally distributed 
about this mean. : 

In order to evaluate the difference between the pair of z 
values at hand, we need to know the standard error of the sampling 
distribution of differences under the hypothesis noted. This 


Standard error is given by 


2 22 
= Vou? + oe! 


Oz,!—29! 


132  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Substituting in the above formula with the sample n's, we obtain 


J Je xpi. QUT 
7a'— ^" N100 3 35-3 N97 32 (82)(97) l 


Then expressing the difference between the 2 observed values of 
z' as a relative deviate, we have, since the mean difference z’ is 
specified by hypothesis to be zero, 


fe! = 22) — 2 _ (929 — 693) —.0 236 i4 
ae n ü .204 04 7 


The value of z 1.16 obtained above we know is not significant 
at the 5 per cent level, for this standard of significance will require 
a value of 1.06. Hence, the data offer no evidence against the 
hypothesis that the 2 samples have been randomly drawn from a 
common population. The 2 values of z’ and thus the values of the 
2 rs, we say, do not differ significantly. 

It is worth noting again that we have made a two-tailed test 
of significance, and this test is consistent with the hypothesis of 
random sampling from a common population. The sampling dis- 
tribution, being symmetrical about zero, would include negative 
as well as positive differences between the pairs of z' values. 
Whether the observed difference is positive or negative is arbitrary, 
depending only upon whether or not we subtract .693 from .929 
or subtract .929 from .693. 


9. FINDING AN AVERAGE VALUE OF THE CORRELATION COEF- 
FICIENT FOR 2 SAMPLES 


Since the hypothesis that the 2 7’s were obtained from a 
common population was regarded as tenable, we may combine 
these 2 values to arrive at an estimate of the population value. 
But instead of taking the simple average of the 2 7’s or the 2 
values of z’, we shall take a weighted average which will take into 
account the fact that the 2 estimates are not equally reliable. 
One estimate is based upon 100 cases and the other upon 35, and 
we have already observed that the stability of r is dependent upon 
the sample size. We shall take the z’ values corresponding to the 
2 r's and weight these values inversely proportional to their vari- 


uA 


cas Oe 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 133 


ances, or, in other words, we shall weight each z^ value by multi- 
plying it by its corresponding value of n — 3. We then sum these 
2 values and divide by È (n — 3) to obtain the average z’. From 
Table VII (Appendix) we may then find the r which corresponds 
to the average z’ and which will be the estimate desired. 

The sum of the weighted z’ values will be equal to (97) (.929) + 
(32)(.693) = 90.113 + 22.176 = 112.289. Then dividing this sum 
by È (n —3), we have 112.289/129 = .870. From Table VII 
(Appendix) we find that the r corresponding to a z’ of .870 is 
approximately .70. This is the desired estimate of the population 
value based upon the combined data from the 2 samples. 

If we had taken the unweighted average of the 2 z^ values, 
we would have .929 + .693 = 1.622 which, when divided by 2, 
gives an average value of .811. Ther corresponding to this is .67. 
The simple average of the 2 r's would be equal to .66. Both of 
these unweighted averages overemphasize the contribution of the 
sample based upon 35 cases compared with the contribution of the 
sample based upon 100 cases. Since values of r based upon 
relatively small samples are quite unstable, it seems only logical 
to use the average of the weighted z’ values as the estimate of the 


population value. This estimate takes into account the difference 


in the sample sizes. 


10. AN AVERAGE VALUE OF THE CORRELATION COEFFICIENT 
BASED UPON SEVERAL SAMPLES 


Let us extend the methods just described. Suppose that we 
have several values of r based upon different samples, We wish 
to obtain an estimate of the population value, making use of the 
information available from the several samples, assuming that the 
Samples are from a common population. The method described 
above may be applied in this instance also. We merely weight 
each of the various 2’ values corresponding to the values of r by 
the reciprocals of their variances or, in other words, byn —3. We 
find the sum of these weighted z’ values and divide by X (n = 3) 
to obtain the average value. The population estimate desired is 
the r obtained from Table VII (Appendix) corresponding to the 


average 2’. 


134  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


This procedure, as in the case of 2 samples, is valid only if 
the samples have been randomly drawn from a common popu- 
lation. Before proceeding with the method of averaging, a test of 
this hypothesis is in order. The test of the hypothesis is con- 
veniently made by means of the x? distribution. To illustrate 
both the method of averaging and the test of the hypothesis that 
the samples are from a common population, let us suppose that 
we have given a pre- and an end-test to several sections of intro- 
ductory psychology. The correlation coefficients, the n’s of the 
sections, and the z’ values corresponding to the 7’s are given in 


Table 17. 


TABLE 17. Data for Calculating x? to Evaluate the Hypothesis of Random 
Sampling from a Population with a Common Correlation Coefficient 


Mace eee 


a) (2) (3) (4) 6G) (6) (7) 
Section n T n—3 z (n — 3) €) (n — 3) (2? 
1 33 .58 30 .590 17.700 10.443 
2 58 .62 55 725 39.875 28.900 
3 42 .65 39 775 30.225 23.424 
4 47 ES 44 .485 21.340 10.350 
Sum 180 2.25 168 2.515 109.140 73.126 


The first 6 columns of Table 17 should be perfectly clear. 
The sum of column 6 when divided by the sum of column 4 gives 
the average value of z’. This is equal to 109.140/168 = .65 
and the value of r .571, corresponding to the average z’, is found 
from Table VII (Appendix). We may notethat the estimate of the 
population value .571 obtained from the weighted z” values does 
not differ very much from the value which would have been ob- 
tained by simply averaging the r’s themselves, which would give 
2.95/4 = .563. "This will not always be true. The reason it is so 
here is that the n’s of the various sections do not differ greatly nor 
do the various values of r. With greater variation in the n’s and 
r's, the two estimates may differ considerably. 


9 Snedecor (19462), p. 154. A proof is given by Kempthorne (1947). 


E 


4 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 135 


11. TESTING THE HYPOTHESIS THAT SEVERAL SAMPLES ARE 
FROM A COMMON POPULATION 


We have more or less put the cart before the horse, finding 
the estimated population value from the combined sections before 
testing the hypothesis that the samples are from a common popula- 
tion. The necessary data for the test of this hypothesis are given 
in column 7 of Table 17. The entries in this column are con- 
veniently obtained by multiplying the entries in column 6 by 
those in column 5. x? will then be given by 


n 5» — 3 ^2 
y -Xo0-36G»- moe e : (40) 


Substituting in the above formula with the appropriate quantities 


from the table, we obtain 
a na 199 — COY 
gale" 185 


= 73.126 — 70.902 
= 2.224 


The number of degrees of freedom for evaluating the x7 
obtained above will be equal to the number of samples minus 1. 
Since we have 4 samples, we have 3 degrees of freedom, and from 
the table of x? it can be readily determined that the obtained value 

we must consider the hypothe- 


of 2.224 is not significant. Hence, 
sis that the samples have been randomly drawn from à common 
population as tenable.!? The estimated population value of the 


correlation coefficient based upon the combined data of the several 
samples is .571. 


H les to test the hypothesis 

10 The x? test ma: also be applied to 2 samp! f 
that the: E SEU MESES from a common population. However, the 
y nre le values of r is most easily ac- 


evi E iference between 2 sample v: 
vaki or ding the standard error of the difference between the 2 corre- 


Sponding values of 2’, and then expressing the difference between the z' values 


as a relative deviate. See p. 131- 


136  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


12. THE FIDUCIAL LIMITS FOR AN AVERAGE VALUE OF THE 
CORRELATION COEFFICIENT 


We may now proceed to establish the fiducial limits for the 
r of .571 corresponding to the average of the weighted z’ values .650. 
The accuracy of the r of .571 based upon the combined samples is 
equivalent to that of a single sample based upon 168 + 3 = 171 
pairs of observations. Hence, its standard error, given by 
formula (37), will be 


1 1 
== MIN-3 12901 


Then, by means of formula (38) the lower and upper fiducial 
limits at the 5 per cent level will be 


2,’ = .650 — (1.96)(.077) = .650 — .151 = .499 


077 


and 


Z2” = .650 + (1.96) (.007) = .650 + .151 = .801 


The values of r corresponding to a z’ of .499 and a z' of .801 are 
obtained from Table VII (Appendix) and are .461 and .664. We 
see that the average r of .571, being based upon a larger number of 
cases, is à much more reliable estimate of the population value 
than any of the sample values considered separately.!? 


13. A CORRECTION FOR A SYSTEMATIC BIAS IN AVERAGING 
z’ VALUES 


In combining the r's from a fairly large number of samples 
by the method described, a slight bias, present in each 2’ value, is 
accumulated which tends to make the estimated population value 
of the correlation, based upon the average of the weighted 2’ 


11 Fisher (1936), pp. 207-208. 

1? [n some research problems it may be desired to test the significance 
of the difference between 2 such average values of r obtained by the methods 
above. This can be accomplished in the manner described earlier by findin 
the standard error of the difference between the corresponding values of z'. 
The values of n; and n» in formula (39) should be taken as the corresponding 
values of Z(n — 3) + 3. 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 137 


values, somewhat too large.!? If the x? test does not result in the 
rejection of the hypothesis of random sampling from a common 
population, then it may sometimes be desirable to obtain the 
most accurate estimate possible of the population value. 
Fisher (1921) gives as a correction term for the bias, the 

value 

a 

2(n — 1) 


which is to be subtracted from each z’ value. The numerator of 
the correction term is the population value, and this, of course, is 
unknown. However, the value of r corresponding to the average 
of the weighted z’ values, obtained in the manner described, might 
be substituted for the population value in the correction term. 
The corrected z’ for the first section in Table 17 would then be 
; 57 
Corrected z’ = .590 — 283 -1) ae 


= .590 — .009 
= .581 


The corrected z’ values for the other sections could be ob- 
tained in the same way that we have obtained the corrected value 
for the first section. These corrected values for the second, third, 
and fourth sections are .720, .768, and 479, — . b 
in cted z’ value by the reciprocal of its variance an 
ae obtain (30) (.581) + e + (88) (708) 4 
(44) (.479) = 108.058. The sum of the weighted, corrected values 
of z' divided by X, (n — 3) will give the corrected average z'. 
This is equal to 108.058/108 = .648, and the r corresponding to 
the corrected average z’ is .567 compared to the value of .571 
obtained from the uncorrected values. The r obtained from the 
corrected average 2’ value will always be somewhat smaller than 
the r corresponding to the uncorrected mean of the z s. 

The correction term for the z’ values is of relatively little 
importance when the n's of the various samples are large, and when 


13 Fisher (1921). 


138 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


we have a small number of samples to combine. With small ws 
and a large number of samples, the correction may result in an 
estimate of the population value which differs considerably from 
that based upon the uncorrected values of z'. 


14. EXAMPLES 

1. Given an r of .26. Approximately how large must N be 
before we can reject the hypothesis that the population value is 
zero? 

2. Given anr of .42 based upon 45 cases. Is this significant 
at the 5 per cent level? 

3. Leahy (1935) reports an r of .20 for the correlation between 
the intelligence test scores of fathers and their children, based 
upon a sample of 186 cases. Leahy reports also that Freeman 
found an r of .28 for the same measures on another sample of 
255 eases. (a) Can we assume that these 2 values were obtained 
from a common population? (b) If the 's do not differ significantly, 
combine them to obtain an estimate of the population value and 
establish the fiducial limits at the 5 per cent level. 

4. Leahy (1935) also reports an r of .22 for 177 cases when 
the correlation between the vocabulary test scores of fathers and 
their adopted children was computed. Leahy states that Burks 
found an r of .13 for the same measures with another sample of 
181 cases. (a) Can we assume that these 2 values were obtained 
from a common population? (b) If the 7’s do not differ signifi- 
cantly, combine them to obtain an estimate of the population 
value and establish the fiducial limits at the 5 per cent level. 

5. Anr of .82 is obtained with a random sample of 39 cases. 
Establish the fiducial limits at the 5 per cent level. 

6. Suppose that N was increased to 628 cases in the above 
example. What are the fiducial limits at the 5 per cent level with 
this N? Are they more symmetrical about the sample value? 
Why? 

7. 'Two random samples of 39 cases and 67 cases yield cor- 
relation coefficients of .38 and .52, respectively. May we infer, 
at the 5 per cent level, that the 2 samples are from a common 
population? 

8. Three random samples of 19, 84, and 28 cases give correla- 


N» 


© 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 130 


tion coefficients of .32, .50, and .39, respectively. (a) Test the 
hypothesis that the several r’s have been randomly drawn from 
à common population. (b) If the hypothesis of a common popula- 
tion is tenable, combine the 7’s to obtain a single estimate of the 
population value and establish the fiducial limits at the 5 per 
cent level. 

9. (a) Test the following set of r’s for homogeneity. (b) If 
they can be combined, find an average value for r and establish 
the fiducial limits at the 5 per cent level. 


Sample n r 
1 10 59 
2 30 32 
3 22 AS 
4 60 .30 
5 53 .25 


10. The data reported below have been taken from Snedecor 
(1935). The correlations reported are between actual grades in 
a freshman mathematics course and estimated scores for the same 
students based upon a preliminary test. The samples represent 
various sections of the freshman mathematics course. 

2 3 4 5 6 7 & 9 10 12 18 


18 18 20 20 21 21 19 20 
79 69 —23 "74 


Section 1 
n 20 20 19 17 21 


T 58 .59 .82 46 .58 .54 28 .69 .80 


e hypothesis that these samples are from a com- 


(b) If the hypothesis of sampling from a com- 
establish the fiducial limits of the 


(a) Test th 
mon population. 
mon population is tenable, 
Population value at the 5 per cent level. : 

11. Dressel (1939) reports the following r's between the 
high-school grades of students and later college grades. Each r 
presented refers to the students of a particular high school. (a) 
Test the values for homogeneity of variance and interpret your 


results. 

School i & 9 # 8 6 7 8 9 10 12 13 14 15 
n 22 25 48 67 27 60 42 98 40 47 44 112 93 54 31 
T A7 .65 .66 .49 Al 87 .64 .77 .55 64 .62 62 .62 44 .60 


(b) If the hypothesis of random sampling from a common 


140 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


population is tenable, establish the fiducial limits of the population 
value at the 5 per cent level. 

12. About 1,500 students in the high schools of Seattle were 
given 2 forms of an attitude test. Samples of 100 papers were 
randomly drawn from the entire set of papers, and the correlation 
(reliability coefficient) was computed between scores on the 2 
forms of the test for each sample. The obtained values for 5 
independently drawn samples were .87, .90, .82, .79, and .91. 
(a) Test these values for homogeneity. (b) If the hypothesis 
of sampling from a common population is tenable, combine the 
values to obtain an estimate of the population value and establish 
the fiducial limits at the 5 per cent level. 

13. Grafton (1948) correlated scores upon a test of technical 
vocabulary in psychology with scores upon a number of other 
tests for a small group of graduate students in psychology. The 
correlations were as follows: , 


Test r n 
C.A.V.D. Completions A8 44 
C.A.V.D. Vocabulary .61 44 
Graduate Record Verbal .58 49 
Graduate Record Index 33 24 
Graduate Record Psychology .68 48 


'Test the r's for homogeneity. It should be noted that in 
this instance the test is not exact for the reason that the above 
samples are not independent, the subjects being the same in the 
various n’s except for missing cases. 

14. Here is another set of r's for practice. Test for homo- 
geneity. 


Sample E 2 3 4 5 6 
n 20 33 22 58 105 47 
T 08 30 40 .60 52 46 


15. The correlation coefficient between the C.A.V.D. Arith- 
metic Test and grades in a course in elementary statistics was .56 
for a class of 48 students. For another section of 44 students, the 
correlation between the same measures was .45. Test the sig- 
nificance of the difference between these 2 7’s. 

16. Twenty subjects were divided at random into 2 groups 


EU 


TESTING HYPOTHESES ABOUT CORRELATION COEFFICIENTS 141 


of 10 caseseach. One group was then assigned to an experimental 
condition and the other group served as a control. Each group 
was given an initial test on the variable on which outeomes were 
to be measured. The experimental group was then subjected to 
the experimental condition, and both groups were retested. The 
correlation coefficient between the initial and final test for the 
experimental group was .62 and for the control group the correla- 
tion was .73. Test the significance of the difference between these 
2 correlation coefficients. 

17. It was pointed out in the chapter that the x? test might 
also be used to test the significance of the difference between 2 r's, 
but that this hypothesis is tested more conveniently by finding 
the standard error of the difference between the 2 corresponding 
values of 2’ and then expressing the difference between the z' 
values as a relative deviate or value of z. Suppose that for one 
sample we have r = .53 and n = 33 and that for the other sample 
we have r = .62 and n = 58. Test the significance of the dif- 
ference between these 2 values by both methods. You will have 
1 degree of freedom for x? and you should find that x? = 2”. 


CHAPTER 8 


The t Test and the Significance of. Means 
and Differences between Means 


1. THE SAMPLING DISTRIBUTION OF THE MEAN 


Let us take a continuous variable X which is normally dis- 
tributed with population mean equal to m and standard deviation 
equal to c. If random samples of size N are drawn from this 
population, the sampling distribution of the means of the samples 
will tend toward normality as the number of samples becomes 
indefinitely large. The standard deviation of the sampling distri- 
bution of the means, which we have called the standard error, will 
be given by oz = c/V/N. 

The numerical value of the standard error will obviously be a 
function of the variability in the population and also of the sample 
size. If the samples are drawn from a population with standard 
deviation equal to 10.0, then the standard error of the sampling 
distribution of means based upon 25 cases each will be equal to 2.0. 
If the population standard deviation is 20.0, then the standard 
error of the mean for samples of 25 cases would be 4.0. To reduce 
either of these standard errors by L4, it would be necessary to 
quadruple the sample size. With samples of 100 cases each, the 
standard error for the samples drawn from the population with 
standard deviation of 10.0 would be 1.0, and for the samples from 
the population with standard deviation of 20.0 the standard error 
would be 2.0. 

If samples are drawn from a skewed population, the sampling 
distribution of the means will not be normal for small samples, but 
will approach normality as the sample size increases. 

We made use of these facts in previous tests of significance 
involving the binomial distribution. There we found that when 

142 


MEANS AND DIFFERENCES BETWEEN MEANS 143 


D = q, the distribution was symmetrical and with n as small as 
10 a good approximation of a normal distribution was obtained. 
Even when p was equal to 14, giving a nonsymmetrical distribu- 
tion, with np equal to 5, a fairly good approximation of a normal 
distribution was obtained. If we had taken a somewhat more 
skewed distribution, let us say, where p was equal to .2, the approx- 
imation of a normal distribution would be fairly good if n was at 
least 25, and a still better approximation would be obtained if n 
was'as large as 50. 

With a normal distribution with mean m and standard devia- 
tion e, we were able to determine the relative frequency with which 


-m a : 
Ped would occur as a result of random sampling. This we 


Op f 
did by entering the table of the normal curve with the value of z. 
By this méthod we were able to determine the probability that 


—™ exceeded any specified value. It is important to note that 


Tp . H . 
the use of the normal distribution table, in this instance, depends 


upon knowing the value of m and gp. These were given exactly 
by hypothesis, in all of the problems involving the binomial 


distribution. 


2. THE t DISTRIBUTION 

In this chapter we shall be concerned with a continuous 
variable X, which we shall assume to be normally distributed. 
From a random sample of N values of X, we shall be able to find the 
mean. The population mean m, the mean of ing sampling dis- 
tribution, may be specified by hypothesis. i But we shall not know 
the value of the population standard deviation c exactly. Instead, 
We shall have available only s, the standard deviation of the sample, 
based upon the variation of the N values of X about the sample 
Mean, If the value of e were known, we could then find ez, the 

-m 

standard error of the mean and obtain lies z. The value of 
2 could then be evaluated by means of the table of the normal 
But since ø is unknown, we shall have to use the 
alue of s to obtain the standard error 


distribution. 
estimate of ø given by the v: 


144  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


of the mean. The difference between X — m divided by the 
standard error of the mean we shall define as £. Thus, 
X-m 


t= (41) 


Sz 


The distribution function of t is dependent upon the number 
of degrees of freedom available in the set of N observations upon 
which s is based.! Hence, the table of ¢ is a two-dimensional 
table which must be entered with both the obtained value of t and 
the number of degrees of freedom. ‘The distribution of t is not 
normal for small samples. Its distribution is symmetrical, as is 
the distribution of z, but beyond a certain point (depending upon 
the number of degrees of freedom available) the curve of t does not 
approach the base line as rapidly as does the curve of the z distri- 
bution. 'This means that in order to cut off 2.5 per cent of the 
total area in each tail of the ¢ distribution, we shall have to go out 
from the mean beyond the value of plus and minus 1.96 that cuts 
off 2.5 per cent in each tail of the z distribution. Just how far out 
we shall have to go again depends upon the number of degrees of 
freedom available. 

Table V (Appendix) is a table of the values of ¢ significant at 
given levels of significance for varying numbers of degrees of 
freedom. It may be observed from the table that as the number 
of degrees of freedom increases, a smaller value of ¢ is required for 
significance, until, for an infinite number of degrees of freedom, 
the values of t significant at the 5 and 1 per cent levels correspond 
to the significant values of z at the 5 and 1 per cent levels, i.e., 
1.96 and 2.58, respectively. 


3. THE FIDUCIAL LIMITS FOR THE MEAN 


Let us take the case of a single random sample from a 
normally distributed population and see something of the use to 
which the ¢ distribution may be put. Suppose that we have a 


1 The value of s itself is subject to sampling variation. As N increases, 
the accuracy with which s estimates o increases also. For very large values 
of N the discrepancy between s and o may be sufficiently small as to be neg- 
lected. Thus, in the limiting case, with N becoming indefinitely large, the 
distribution of ¢ approaches the distribution of z. 


<a 


MEANS AND DIFFERENCES BETWEEN MEANS 145 


sample mean equal to 62.0 and that s, obtained by formula (8), is 
equal to 14.0, and the sample size is 49. The standard error of 
the mean, as given by formula (4), would be equal to 2.0. Then, 
if the population mean, which will also be the mean of the sampling 
distribution, is known, we may find 
X—m 620—m 
dE EE 

'The number of degrees of freedom available for evaluating 
this ¢ will be equal to N — 1, or 1 less than the number of observa- 
tions in the sample. As was pointed out in the earlier discussions 
of x?, the number of degrees of freedom available in a set of N 
observations depends upon the number of restrictions placed upon 
theobservations. In caleulating s, which is used to find the stand- 
ard error of the mean, the mean of the sample was first found. 
"Then the value of s was based upon the variation of the N observa- 
tions about the sample mean. Since the sum of the deviations 
E (X — X) must equal zero, only N = 1 of the observations are 
free to vary, the last observation being fixed. The number of 
degrees of freedom for s is clearly indicated in the denominator of 
the formula for s, formula (3). 

In the present problem N is equal to 49, so that we shall have 
48 degrees of freedom for evaluating t. By reference to the table 
of t, we find that a value of 2.01 will be required for significance at 
the 5 per cent level. Now we do not know the value of m, but we 
do know the value of ¢ which would be significant at the 5 per cent 
level. We also know the value of the sample mean and the value 
of the standard error of the mean. i we abide by the usual 
practice, we shall regard any hypothesis about m as tenable when 
the probability attaching to ¢ is greater than .05 and as untenable 
when the probability is less than .05. Thus, in the present 


2 Walk has pointed out that imposing a relationship upon the 
Set of Pg ort is equivalent to estimating a parameter from them. 
Thus, taking the squares of the deviations from the mean X is equivalent to 
using X as an estimate of the population mean. — In general, the number of 
degrees of freedom available will be equal to N minus the Humber OE param 

observations. Since the mean is the only 


et i the set of N 3 
ee oM in finding s, the number of degrees of freedom will be 


equal to N — 1. 


146  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


problem, if - 

X - X- 

A1 xemp o d B 
Sz Sz 


> o. 


the hypothesis concerning m will be rejected. From this we may, 
as we did before in the case of the correlation coefficient,” determine 
the lower fiducial limit (mi) and the upper fiducial limit (m2) at 
the 5 per cent level as follows: i 


m, = X — (Los) (sz) and ma = X + (Los)(sz) (42) 
For the case at hand, the value of X is 62.0, t.os for 48 degrees 


of freedom is 2.01, and sz is 2.0. Substituting these values in 
formula (42) and solving for m; and mo, we have 


62.0 — (2.01)(2.0) = 62.0 — 4.02 = 57.98 


mai 


and 
m = 62.0 + (2.01) (2.0) = 62.0 + 4.02 = 66.02 


If substituted in formula (41), the 2 values which we have 
found for m, 57.98 and 66.02, would give values of t equal to plus 
2.01 and minus 2.01 and for which the value of P would be .05. 
Any value for m which was smaller than 57.98 or larger than 66.02 
would yield a ¢ value numerically larger than 2.01 and for which the 
value of P would be less than .05. Thus, any hypothesis that the 
population mean is as low as 57.98 or lower would be regarded as 
untenable, and, similarly, any hypothesis that the population mean 
is as high as 66.02 or higher would be regarded as untenable. On 
the other hand, any hypothesis that the population, mean falls 
within the fiducial interval would be regarded as tenable. 

Tf the obtained value of t is a significant value for a given 
number of degrees of freedom, then we may question the initial 
assumption upon which t is based, namely, that of random sam- 
pling from a normally distributed population with mean equal to m. 
If the sampling is random, then our doubts would concern the 
value of m assumed and substituted in formula (41). We thus see 
that the ¢ distribution may be used to test various hypotheses 
about the unknown population mean. 


3 See formula (38). 


—_ 


ie 


MEANS AND DIFFERENCES BETWEEN MEANS 147 


4. INCREASING THE SIZE OF THE SAMPLE 


It should be noted that with a sample as small as the one at 
hand and with variability as great as that observed, we cannot have 
much confidence in estimating the population mean accurately 
from the sample data. We-have already noted that tenable 
hypotheses about the population value fall within the interval 
57.98 to 66.02, a range of 8.04 points. Increasing the sample size 
to 100 cases, i.e., slightly more than doubling the sample size, will 
serve to reduce the fiducial interval in 2 ways, if the variability of 
the measures remains constant. 

In the first place, the standard error of the mean will now be 
14/V/100 or 1.4 in contrast to the value of 2.0 when the sample 
consisted of only 49 cases. In the second place, the value of t 
which will be significant at the 5 per cent level will now be that for 
99 degrees of freedom or approximately 1.984 instead of 2.01, which 
was the value at the 5 per cent level for 48 degrees of freedom. 
The reduction in these 2 values will serve to reduce the fiducial 


interval accordingly. We would now have 
m, = 62.0 — (1.984) (1.4) = 62.0 — 2.78 = 59.22 


and 
ma = 62.0 + (1.984) (1.4) = 62.0 + 2.78 = 64.78 


With 99 degrees of freedom available, hypotheses about the popu- 
lation mean would now be tenable within the fiducial interval of 
59.22 to 64.78, a range of 5.56 points. This fiducial interval may 
be compared with that previously established, 57.98 to 66.02, 
with a range of 8.04 points, when only 48 degrees of freedom were 
available. ‘The moral here is plain: If we are interested in esti- 
mating a population mean within fairly narrow limits, large samples 
will be necessary if the estimated standard deviation of the popu- 
lation is as large as 14.0. Smaller samples drawn from a less 


variable population may, of course, prove adequate. 


5. FIDUCIAL PROBABILITY 
The statements of probability made in connection with tests 
ce are sometimes misinterpreted. It must be em- 


of significan 
is lity read from the table of t (or x? or z) 


phasized that the probabi 


148 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


refers to the probability of ¢ (or x? or z) under the conditions laid 
down by the hypothesis tested. Thus, an obtained value of ¢ for 
which the value of P is .05 means that if the hypothesis under 
which this ¿ was obtained is true, then the probability of ¢ is .05. 
It cannot be argued logically that the probability of the population 
mean being as low as or lower than the left fiducial point or as high 
as or higher than the right fiducial point is .05. Such arguments 
are traditionally known as problems in inverse probability as con- 
trasted with fiducial probability. 

Fiducial probability refers to the confidence we have in 
inferences based upon consistently following a defined level of 
significance. If, in random sampling from a normally distributed 
population, we always infer that the population mean falls within 
the fiducial interval as established at the 5 per cent level, 95 times 
in 100 this inference will be correct. This relative frequency of 
“correctness of our inference" may be taken as a statement of 
probability, but the probability refers directly to the inference and 
not the population mean. 


6. AN EXPERIMENT ON RETENTION 


In an experiment upon the influence of time upon remember- 
ing, a group of 40 subjects was divided at random into 2 groups of 
20 subjects each. Both groups of subjects learned a list of words 
to perfection. One group was then tested for recall of the words 
after an interval of 4 hr.; the other group was tested for recall after 
an interval of 8 hr. The “retention scores" of the 2 groups are 
given in Table 18. 

The mean score for Group 1, the 4-hr. group, will be 


> ÈX, 220 
= = = 11 
At. 09) e 
and the mean score for Group 2, the 8-hr. group, will be 
-EX 160_ 
Xs- m 20 s0 


We do not know what the population mean m; is; we have only 
an estimate provided by the sample mean, and this is equal to 11.0. 
Nor do we know what the population mean ma is, but we have an 


s] 


MEANS AND DIFFERENCES BETWEEN MEANS 149 


TABLE 18. Retention Scores of 40 Subjects Divided at Random into 2 Groups of 
20 Subjects Each—Group 1 was Tested 4 Hr. after Learning and Group 2 was 
Tested 8 Hr. after Learning 


Group 1 Group 2 
4 Hr. 8 Hr. 

12 6 4 1 
6 16 12 8 
7 13 9 10 

12 10 9 8 

10 T 14 6 

16 10 9 8 

13 12 11 9 

14 11 0 9 
9 9 9 10 

1l 3 


14 13 
>, 


estimate of it provided by the sample mean of 8.0. If the experi- 
ment were repeated under the same conditions an indefinitely large 
number of times, we would not expect to obtain exactly the same 
values for X; and X that we did this time. The means of samples 
are subject to random variation, and this will also be true of the 
difference between the means of 2 samples. Even when there is 


no “true” difference between retention after a 4-hr. and after an 


8-hr. interval, i.e., when m; = me, random sampling might result 
in a difference as large as that observed here between the 2 sample 


means. If we knew how frequently a difference as large as the one 
observed here would occur as a result of random sampling when 
e would have a basis for evaluating the outcome of the 


my = mo, W 
experiment. 
Let us set up the hypothesis, therefore, that the samples are 


from a common normally distributed population. Under this 
hypothesis, if we consistently subtracted X» from X, sometimes 
the difference would be positive and sometimes negative, but the 
average difference would be zero. The sampling distribution of the 
differences between the means would be distributed around zero. 
Then, if we knew the standard deviation of this distribution of 
differences, the standard error, it would be possible to evaluate the 


Observed difference by means of the ¢ distribution. 


150  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


7. THE STANDARD ERROR OF THE DIFFERENCE BETWEEN 2 
MEANS 
The standard error of the difference between the means of 2 
independent samples will be given by 


Oi, = Von + cm (43) 


and this may also be written as 


2 2 
zz (44) 
ny Ng 


04,2, = 


Although e;? and e»? are unknown, each might be estimated by 
means of formula (2), and these values may be substituted in (44) 
above for the unknown parameters. Thus, 


2 2 
sey m (45) 


Sag-—z,-— 
ny n2 


Under the hypothesis, however, that the 2 samples are from 
the same normal population, sı? and sọ? are both estimates of the 
same parameter and the 2 sample sums of squares may be pooled; 
and this pooled sum of squares may be divided by the pooled 
degrees of freedom to provide a common estimate s of the param- 


etero?. Let Zz? = Y, (Xi — Xi)? and ¥ r? = È (X: — Xsy?, 


then 
ay? Yo 
ques DE H M 2 (46) 
ny + ne —2 
Substituting the common estimate s? for the values of s;? and 
s2 in formula (45) above, we have 


Lutt? Lar+D a 
nj + n9 — 2 4 ny + ng — 2 


Sz,—2, = 
aii nı nə 


(47) 
and this may be simplified to 


ny +n —2 ni 


a 


$$ 


MEANS AND DIFFERENCES BETWEEN MEANS 151 


It should be noted that Ð 2;? refers to the sum of squared 
deviations of the nı measures in Group 1 about the mean of Group 
1, and, similarly, Y; x2” refers to the sum of the squared deviations 
of the ns measures about the mean of Group 2. A convenient 
method for calculating these sums of squares is 

752 
xy (49) 


n 


Ee =D = 


By formula (49) we find that the sum of squares within Group 1 is 


220)? 
Ya? = 2,596 — a = 176.00 


and the sum of squares within Group 2 is 


160)? 
Y z = 1,522 — E — 242.00 


Substituting in formula (48) for the standard error of the 
difference between the means, we have 


176 212 Vf 1 e (853) - 10 
ina = ae 2) (5*3 38/20) 7 1089 


8. TESTING THE NULL HYPOTHESIS 


Now the null hypothesis to be tested is that mı — ma— m 0, 
and hence the observed difference between the means 11.0— 8.0= 3.0 
is assumed to be a deviate from the population value m which is 
specified as zero. Thus, ¢ will be given by 


(X, — Xe) —m _ (11.0 — 8.0) — 0 _ 3.0 
"m" E 1.049 1.049 
The number of degrees of freedom for evaluating the obtained 
t of 2.86 is indicated by the term in the denominator of the ¢ ratio 
$5.5, which is based upon nı +n —2= 38 independent obser- 
vations. From the table of t we find that a value of 2.711 will 
be significant at the 1 per cent level for 38 degrees of freedom. The 


- 2.86 


4 -en deviations from the 2 sample means, thus placing 2 
etis umm AN of nı + ng observations. Consequently, we lose 2 
degrees of freedom. 


152 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


probability of obtaining a value as large as 2.86 is, therefore, less 
than .01 under the hypothesis of random sampling from a common 
population. Abiding by the usual significance standards, the 
hypothesis would be rejected. The 2 means, we conclude, do 
differ significantly; the discrepancy between them is greater than 
we think likely to result from random sampling from a common 
population. 
It may be objected that the conclusion drawn concerning the 
2 means does not follow from the rejection of the hypothesis of 
random sampling from a common population. It might be argued, 
for example, that if the 2 samples are from different populations, 
then it is the population variances which differ significantly and 
not the means. It is true that “a difference in variance between 
the populations from which the samples are drawn will tend some- 
times to enhance the value of ¢ obtained."? For this reason, it is 
often mistakenly assumed that the ¢ test based upon the pooled 
sums of squares, assumes that the 2 variances are equal. In this 
connection Fisher has said: 


This is an incorrect form of statement; the equality of the variances 
is a necessary part of the hypothesis to be tested, namely that the two 
samples are drawn from the same normal population, The validity of 
the t-test, as a test of this hypothesis, is therefore absolute, and requires 
no assumption whatever. It would, of course, be legitimate to make a 
different test of significance appropriate to the question, Might these 
samples have been drawn from different normal populations having the 
same mean? This problem has, in fact, been solved, but in relation to 
the real situations arising in research, the question it answers appears to 
be somewhat academic.® 


9. THE FIDUCIAL LIMITS FOR THE DIFFERENCE BETWEEN 2 MEANS 


The fiducial limits of the difference between the 2 means may 
be established in the same manner that we did for the case of a 
single mean. At the 5 per cent level, t is 2.025 for 38 degrees of 


5 Fisher (1936), p. 129. Fisher adds: “The theoretical possibility, 
that a significant value of t should be produced by a difference between the 
variances only, seems to be unimportant in the application of the method to 


experimental data. . . di 
6 Fisher (1936), p. 130. 


p 


i 


— ed 


MEANS AND DIFFERENCES BETWEEN MEANS 153 


freedom. Then 
m; = 3.0 — (2.025) (1.049) = 3.0 — 2.124 = .876 


and 
mo = 3.0 + (2.025) (1.049) = 3.0 + 2.124 = 5.124 


The lower fiducial limit of .876 and the upper fiducial limit of 5.124 
mark the points within which hypotheses about the population 
mean difference will be regarded as tenable. In inferring that 
the population mean difference falls within the fiducial interval, 
we shall be correct in our inference unless a 1 in 20 chance has 
occurred in the sampling for, in the long run, we expect 95 per 


cent of such inferences to be correct. 


10. ESTIMATING THE SIZE OF THE SAMPLE FOR A REPETITION 


OF THE EXPERIMENT 


Let us suppose that an experiment has been carried out with 
s each and that s?, the estimate of the common 
e based upon the pooled sums of squares and 
degrees of freedom, is equal to 20.0. Let us assume also that the 
observed difference between the 2 means is equal to 3.8. In this 
particular case, where n = n2 = % formula (48) reduces to 


fos? 
82-4 — ES (50) 


r of the difference between the 2 means, for the 


2 groups of 10 case 
population varianc 


The standard erro 
problem at hand, will thus be 
@ 20) _ 99 
Web T0 
Under the hypothesis of random sampling from a common 
normal population, we obtain 


t= ua = 1.9 
2.0 
es of freedom we find, by reference to the table 


whi degre 
ich for 18 degr er cent level, the tabled value being 


of t, is not significant at the 5 p 
2.101. 


154 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Now the number of cases used in the 2 groups in this experi- 
ment is quite small and the observed difference approaches the 
significance level but does not quite reach it. From an examina- 
tion of any of the formulas for the standard error we know that 
increasing the sample size will tend to decrease the value of the 
standard error. By increasing the number of subjects in his 2 
groups, the experimenter could thus increase the sensitiveness of 
the test of significance. With a larger number of subjects, in 
other words, the difference observed between the 2 means might 
be significant and the null hypothesis rejected. If the experi- 
menter does decide to repeat the experiment, an important ques- 
tion which confronts him is: How many subjects should he use in 
each of the 2 groups? 

Let us assume that if the experiment were repeated 


2 


2 
dad ERO. guo 


will remain constant. C, which is the square root of the above 
quantity or s/ (X — X3), is called the coefficient of variation." 
Now the ¢ ratio, under the hypothesis that mı — m» = m = 0 and 
with nı = ng = n, may be written 


Xi - Xs 


- E (52) 
n 


Squaring both sides of this equation and solving for n, we obtain 
28°? 

n = -5 c 

(Xi — X3) 


= 2PC? (53) 


Let us substitute for the observed value of (1.9) a value jd 
which may be defined as the tabled value that would be significant 
for the number of degrees of freedom available. We have 10 + 
10 — 2 — 18 degrees of freedom available, and from the table of ¢ 
we find that t’ must be 2.101 to be significant at the 5 per cent 
level. Since C? is assumed to be constant, then n’, the new number 


7 In the case of a single sample, C = s/X. 


p 


MEANS AND DIFFERENCES BETWEEN MEANS 155 


of subjects in each group, in the repetition of the experiment, will be 
a = 2/2? (53a) 


But C? is equal to n/2P, and substituting this identity for C? in 
formula (53a) we have 


12 


«gfe E k 
n 2t 28 ur (53b) 
Formula (53b) permits us to find readily the number of 
subjects to be used in each group in the repetition of the experi- 
ment. Solving for n’ in the problem under discussion, we obtain 


(2.101)? 
| I micans: 173 
"210 y 
ted, the experimenter would 


'Thus, if the experiment were repea 
probably want to have, as & minimum, 12 subjects in each of his 


= 12.2 


2 groups.? i à f 
Suppose now that the experiment is repeated with 12 subjects 
in each group and that (C? does remain constant. Then, by formula 


(52) we obtain 


8 
2 _38 = 208 


ai [@) (20) ~ 3.3333 
12 


It may be observed that, within limits of rounding errors, the value 
me as the value which we used in solving 


of t now obtained is the sa 2 us 
for n’ and is a value which would be regarded as significant at the 
5 per cent level, if only 18 degrees of freedom were available.” 


But we now have 12 + 12 — 2 = 22 degrees of freedom available 
for evaluating t, and the value of P will obviously be somewhat 


less than .05. 


8 1f ¢! had been taken as the value which would be significant at the 1 
per cent level (2.878) instead of the value significant at the 5 per cent level 
(2.101) for 18 degrees of freedom, then we would have n' equal to 22.9 for 


ini 23 subjects in each group. — 
biis erapl 12.2, as impossible value, instead of 12 would have given 


t = 2.101. 


156 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


11. THE INFLUENCE OF CHANGES IN C? ON THE SUBSEQUENT 
VALUE OF f 


The procedure described, of course, offers no sure guarantee 
that the value of ¢ obtained with the repetition of the experiment 
and with the number of subjects given by formula (53b) will be 
equal to 2.101. Suppose, for example, that the value of C? is 
greater in the repetition of the experiment than in the first trial. 
This will be the case, if the observed difference between the means 
is now 2.0 instead of 3.8 and if s? remains the same. The value of 
C? will now be 5.0 instead of 1.39, as it was in the first trial. Solv- 
ing for t, we now obtain 


2.0 2.0 
t = —— = — = 1.09 
(2)(20) 3.3333 
12 


Let us take the other case, where the value of C? is smaller 
in the repetition of the experiment. This will be true, for example, 
if the difference between the means is now 5.37 and if s? is 30.0, so 
that C? now equals 1.04. Then ¢ will be 


p= 237 _ 5.87 _ yo 
(2)(30) V50 — 
12 


From the numerical examples cited, it should be clear that 
an increase in the value of C? will result in a smaller value of t, 
whereas a decrease in the value of C? will result in a larger value of 
t than the value of 2.101 which was used in formula (53b) in solving 
for n'. Whether the repetition of the experiment with the larger 
values of n; and ng will result in a significant value of t obviously 
depends upon the reasonableness of the assumption involved in 
arriving at the estimate of n’ by means of formula (53b). One 
hazard in the procedure is that with relatively small samples, both 
the value of s? and the value of the observed difference between the 
means may show relatively wide fluctuations in successive random 


MEANS AND DIFFERENCES BETWEEN MEANS 157 


samples. This, in turn, means that the value of C?, in any single 
experiment, may be a rather unreliable estimate of what the value 
of this same ratio will be in successive samples, or, in other words, 
in the repetition of the experiment. 

We may note, however, that both the numerator and the 
denominator of C? are unbiased estimates of population values. 
Thus, the chances of C? being larger in the repetition of the experi- 
ment are equal to the chances of C? being smaller. We thus have 
a 50-50 chance of obtaining a value of ¢ equal to or larger than t’ in 
the repetition of the experiment. But because of the larger number 
of degrees of freedom which will be available for evaluating ¢, 
whatever its value may be, the odds are slightly in favor of a 
significant value with P equal to or less than .05. 


12. EXAMPLES 
1. Given a sample of 16 cases with mean equal to 22.4 and s 
equal to 4.3. Establish the fiducial limits of the parameter at the 


5 per cent level. 
2. The mean score on a vocabulary test for a sample of 200 


college freshmen is 133.8 with s equal to 14.7. At another college 
a sample of 140 freshmen have a mean score of 138.4 and s equal 
to 15.2 for the same test. Test the hypothesis that the 2 samples 
are from a common population. : 

3. Forty subjects are divided at random into 2 groups. One 
group is given training in problem solving by emphasis upon general 
methods and procedures. The other group 15 given practice in the 
actual solving of problems. The 2 groups are then tested on a 
problem-solving test. ‘The scores are as follows: 


" 
Methods Group Practice Group 


39 41 39 44 36 41 30 39 
39 40 39 40 36 39 33 37 
37 42 37 43 35 42 36 37 
44 38 38 38 s z z a1 
43 38 41 39 


e no calculating machine, you may find the compu- 


ad no you subtract a constant, say 30, from 


tations somewhat easier if 


158  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


each of the scores above. If the same constant is subtracted from 
the scores of each group this will not influence the difference 
between the means. The subtraction of a constant will not influ- 
ence measures of variability such as the standard deviation. The 
test of significance of the difference between the means of the 
2 groups may be made with the "coded" scores. 

4. Morgan (1945) designed an experiment to test the hypoth- 
esis that failures to solve a problem tend to foster inductive 
reasoning more than immediate success. “S’s were confronted 
with the problem of discovering which of six cues to follow in order 
to make a bell ring. In one group (called the restricted hypothesis 

- group) the cue which would make the bell ring was predetermined 
by the E. In another group (called the unrestricted hypothesis 
group) success followed the use of any cue by the S. Interspersed 
throughout the experiment were test series to determine how well 
the S’s in both groups could discover a predetermined cue" (p. 146). 
The question is whether the individuals in the restricted hypothesis 
group profited by the mistakes made in searching for the correct 
cue and surpassed, on the test series, the performance of the sub- 
jects in the unrestricted group. The data are as follows: 


Unrestricted Hypothesis Group Restricted Hypothesis Group 
6 12 14 19 35 8 9 12 25 
7 12 14 23 5 8 9 13 
8 12 15 24 6 8 10 13 

10 13 15 30 6 9 10 15 
10 14 16 34 T 9 11 15 


Test the hypothesis that the 2 samples are from a common 
population. 

5. Butsch (1932) recorded for subjects the average number 
of spaces the eye was in front of the hand in typing a standard 
selection. This measure is called the eye-hand span. These 
measurements were made by an elaborate device for photographing 
eyemovements. Records were also obtained of the speed at which 
the subjects typed. Only part of the data is given here—the 
records for the 60-word-a-minute group and the 70-word-a-minute 
group. 


a 


MEANS AND DIFFERENCES BETWEEN MEANS 159 


Length of Span 60-Word Group 70-Word Group 


f 

13 1 
12 3 3 
11 2 Í 
10 2 6 
9 3 10 
8 12 7 
T 4 16 
6 17 16 
5 18 19 
4 12 9 
3 27 9 
2 9 6 
1 5 3 
0 5 2 

The frequency distributions of length of span for the 2 groups 


are given above. Test the hypothesis that the 2 groups are from 


a common population. 


6. Two groups of high-school students were given an attitude 


test. The students were from 2 different high schools. Test the 
hypothesis that the 2 groups are from a common normally dis- 
tributed population. The scores are as follows: 


High School A High School B 
127 28) 188 143 256 sg 200 211 172 251 
241 172 204 226 907 119 204 204 
247 239 153 246 274 2939 121 241 
102 252 187 101 129 276 197 147 
185 173 308 218 201 178 242 188 


ewhat easier if you subtract 100 
feach group. Do not forget that 


this will give you à negative value for the score of 88 in Group 
B. The difference between the means of the 2 groups will be 
unchanged by this subtraction, and the sums of squares will be 
uninfluenced. The ¢ test may be applied directly to the “coded” 


scores. 
7. An experiment involving 2 independent random samples 


of 25 cases each yields the following data: Sample I: E a1” = 420; 


Computations will be som! 
from the scores of the members o 


160 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Sample II: $ x: — 482. The difference between the means of 
the 2 groups is 3.03. May we reject the hypothesis of random 
sampling from a common population at the 5 per cent level? 

8. Twenty-six subjects were divided at random into 2 groups. 
One group was motivated by condition À and the other group by 
condition B. The performance scores of the 2 groups in a test 
situation are as given below: 


Condition A Condition B 


52 171 151 45 71 86 218 165 
75 54 101 95 141 152 

170 104 74 151 52 120 
30 81 146 53 108 115 


Test the hypothesis that the 2 groups are random samples 
from a common population. 

9. Note the range of scores in Example (8) and Example (6). 
The extent of the variability within groups in these 2 examples 
indicates that larger n’s will be required to detect differences 
between the means if such differences do actually exist. (a) Using 
the method described in the chapter, assuming that the difference 
between the means and the estimate of the variance will remain 
relatively the same, solve for the n required in each group in 
Example (6). (b) Do the same for Example (8). 

10. A group of 40 rats was divided at random into 2 groups 
of 20 each. One group was given a 12-hr. period in a maze and 
permitted to run about freely. The other group had no prior 
experience with the maze. Both groups were deprived of food for 
the same length of time and tested in the maze. Records were 
kept of the number of trials required to learn the maze to the 
criterion of one perfect run. The number of trials per animal in 
each group is given below: 


No-Experience Group Prior-Experience Group 


10 7 9 6 12 ri 9 6 
8 6 10 13 5 9 9 9 
9 7 12 12 6 4 8 4 

15 6 9 11 9 10 11 6 
9 13 4 9 9 7 10 7 


Test the hypothesis that the 2 groups are random samples 
from a common population. 


a MM a. 


3 


E 


MEANS AND DIFFERENCES BETWEEN MEANS 401 


11. In testing a control and an experimental group, consisting 
of 10 subjects each, the standard error of the difference between the 
means is found to be equal to 1.42 and the difference between the 
means equal to 2.56. If the experiment is repeated, about how 
many subjects should be used in each group? Assume that the 
variance and the difference between the means will remain rela- 
tively the same or that, in other words, the coefficient of variation 
will remain constant. 

12. Two random samples are formed from a group of 30 sub- 
jects. One group consists of 10 subjects, the other consists of 20 
subjects. Members of the first group are given a placebo and told 
that it is a drug designed to raise their level of performance. 
Members of the second group are given the same placebo but are 
told that the influence of the drug is to make them feel tired and 
depressed. Performance of the 2 groups is then tested by means 
of a tapping test. Scores have already been reduced by the sub- 


traction of a constant. 


Group 1 Group 2 

7 4 10 14 4 8 
12 13 4 11 14 10 
17 9 5 12 11 9 
14 12 10 . 7 8 7 
12 13 8 12 5 11 


Test the hypothesis that the 2 groups are random samples 


from a common population. 


CHAPTER 9 


Heterogeneity of Variance and the t Test 


1. INTRODUCTION 


In the last chapter we had occasion to mention that the 
rejection of the hypothesis that 2 samples have been randomly 
drawn from a common normal population, as a result of the appli- 
cation of the ¢ test, may raise some additional hypotheses. Let us 
suppose that subjects have been randomly assigned to a control 
and an experimental condition or to two experimental conditions. 
It may sometimes happen that one of the conditions will serve 
to increase or decrease the variance as well as influence the mean 
of the variable being investigated. 

If the experimenter observes that the variance within one 
of his groups is quite a bit larger than the variance within the 
other group, and if the ¢ test. has resulted in the rejection of the 
hypothesis that the 2 samples are from a common population, he 
may still wonder whether the hypothesis of a common population 
variance is tenable. If a test of significance indicates that the 
samples could have been drawn from populations with a common 
variance, then obviously the basis of the significant value of t 
must be the difference between the means. 

A supplementary test of the hypothesis of homogeneity of 
variance, as it is often called, may be applied whenever it appears 
that one of the 2 sample variances is much larger than the other. 
The question as to how much larger depends upon the number of 
observations in the 2 groups. If the 2 samples each have 10 or 
fewer cases, then one of the 2 variances will have to be approxi- 
mately 4.5 times as large as the other in order that the difference 
may be significant at the 5 per cent level. With 20 observations 
in each group, one of the 2 variances will have to be approximately 
2.5 times as large as the other in order that they may differ signifi- 

162 


HETEROGENEITY OF VARIANCE AND THE | TEST 163 


cantly. With still larger samples, in the neighborhood of 30 
cases each, a variance which is 2.0 times as large as the other will 
be sufficient to reject the hypothesis of a common variance. If 
these guideposts are kept in mind, it can readily be determined 
whether or not the supplementary test of homogeneity of variance 
should be applied. The test, of course, is always in order and 
may be applied in any case. It is relatively simple to perform. 


2. THE F DISTRIBUTION 


'The test of significance is based upon the hypothesis that 
the 2 samples have been drawn from a population or populations 
with a common variance so that c1? = es? — o°. Under this 
hypothesis, sı? and s»? are both estimates of the same parameter 
and may be expected to differ only as a result of random sampling. 
The ratio of these 2 sample variances is distributed in a manner 
discovered by Fisher (1936),! and tables of the ratio have been 
constructed by Snedecor (1946a) who named the ratio F in Fisher's 


honor. 
Now if F is defined as 


pa o pl (54) 


or as the ratio of 2 variances, then whether P will be larger than 
1.00 or smaller than 1.00 will depend merely upon whether s;? or 
so” is put in the numerator of the ratio. The tabled values of F 
(Table VIII, Appendix) are for the right tail of the F distribution, 
so that F must always be greater than 1.00 if the le: is to be 
used. Hence, the test will be made by placing either $1 Or Sg" In 
the numerator of the ratio, depending on which is the larger of 
di oo at hand is one of determining whether s? 
and s? differ significantly, and this is a two-tailed test, i.e., we are 
interested in a level of significance and must be prepared to reject 
the hypothesis of a common population variance if F is either 
eas Of , istical meth rch work 

was n Oh 1026 edition, which E sed as call 

f this classic work. Still later editions have ap- 


here, is the sixth edition o! s la 
peared at Pakana > of about 2 years since the 1936 edition. 


164 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


significantly larger or significantly smaller than 1.00. Now it 
can be shown that the significance level is given by doubling the 
probability of the values of F at the 1 and 5 per cent points.” 
That is, in a test of the nature described here, if the larger value 
of s? is always placed in the numerator of the F ratio, the tabular 
probabilities at the 1 and 5 per cent points correspond to the 
probabilities at the 2 and 10 per cent levels of significance, respec- 
tively. If we are to reject the hypothesis at the 5 per cent level, 
it will be necessary to interpolate between the 1 and 5 per cent 
points of / in the table to obtain the 5 per cent significance level. 


3. TESTING FOR HOMOGENEITY OF VARIANCE 


Let us take a numerical example. In the experiment upon 
retention after a 4-hr. and an 8-hr. interval, described in the last 
chapter, the sum of squares within Group 1, the 4-hr. group, was 
176.0 and within Group 2, the 8-hr. group, the sum of squares 
was 242.0. Then, by formula (2), the variance within each of 
the 2 groups would be 


176 242 
s? = 7 9.263 and s = d9 = 12.737 


The F ratio, by formula (54) would then be 


The table of F, Table VIII (Appendix), differs from any of 
the tables so far considered in that it is a three-dimensional table. 
It must be entered with the value of F and also with the degrees 
of freedom corresponding to the 2 values of s? in the ratio. The 
rule is that we enter the column of the table of F with the number 
of degrees of freedom corresponding to the numerator of the ratio 
and we run down the column until we find the row entry cor- 
responding to the number of degrees of freedom of the denominator. 

In the present problem, we are already aware of the fact 
that both sı? and sj? are based upon 19 degrees of freedom. So 
we enter the column of the table of F with 19 degrees of freedom 


mee 


2 Hoel (1947), p. 153. 


Ss 


HETEROGENEITY OF VARIANCE AND THE t TEST 165 


and find the row entry corresponding to 19 degrees of freedom. 
The table indicates that a value of approximately 3.00 would be 
required for significance at the 1 per cent point or at the 2 per 
cent level. A value of F equal to approximately 2.15 would be 
significant at the 5 per cent point or at the 10 per cent level. By 
rough interpolation, between the 2 tabled values, we find that 
an F equal to approximately 2.5 would be significant at the 5 
per cent level. Our obtained value of F, 1.375, is less than the 
value significant at the 5 per cent level, and therefore the P value 

ter than .05. The hypothesis of a 


attaching to it must be grea 
common population variance would thus be regarded as tenable. 


Since the ż test, as described in the last chapter, yielded a signifi- 
hesis of random sampling from a common 


cant value of t, the hypot 
population was rejected. But the F test for homogeneity of 


variance indicates that the samples’ could very well have been 
drawn from a population or populations with a common variance. 
Hence, it must be the population means that differ. 

The results obtained above will be the usual case in experi- 
mental work, if subjects are assigned at random to the experi- 
mental conditions. A significant difference in variances may 
result in a somewhat larger value of t than would otherwise be 
obtained, but it seems unlikely that E significant value of t will be 
produced only by a difference in variances. 


4, THE EFFECTS OF NONNORMALITY 

When differences in variance are observed from one experi- 
mental condition to another, it is often an indication of non- 
normality of the variable being measured. The ¢ test, in which 
the standard error of the difference is based upon the pooled sums 
of squares and degrees of freedom, as in formula (48), is directed 
toward testing the hypothesis that 2 samples have been randomly 
drawn from a common norma : If the variable meas- 
ured is not normally distributed, what influence will this have 
upon the distribution oft? To what extent will the probabilities 


read from the table of t be in error? | 
This problem has been investigated, and fortunately the 


l population. 


3 Fisher (1936), p- 129- 


166 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


evidence indicates that the two-tailed ¢ test will be relatively little 
influenced by departures from normality. Benepe (1949), for 
example, made an empirical sampling study of the distribution of 
t for small samples. He set up a normal population and then 
3 nonnormal populations with increasing degrees of skewness.* 
Pairs of random samples of 5 cases each were drawn from each 
population, and the values of ¢ were computed. Five hundred 
values of t were obtained for each of the populations. For the 
normal population, the per cent of the /'s which equaled or ex- 
ceeded the tabled value at the 5 per cent level for 8 degrees of 
freedom was 3.4. For the nonnormal populations, in terms of 
increasing skewness, the per cents were, respectively, 3.0, 4.0, and 
2.8. For the normal population, the per cent of the ts which 
equaled or exceeded the tabled value at the 1 per cent level for 
8 degrees of freedom was 0.0. For the nonnormal populations, 
in terms of increasing skewness, the per cents were, respectively, 
0.8, 0.8, and 0.6. 

In summarizing the results of other investigations," Cochran 
(1947) states: 


The consensus from these investigations is that no serious error is 
introduced by non-normality in the significance levels of the F-test or of 
the two-tailed t-test. While it is difficult to generalize about the range of 
populations that were investigated, this appears to cover most cases 
encountered in practice. If a guess may be made about the limits of 
error, the true probability corresponding to the tabular 5 percent signifi- 
cance level may lie between 4 and 7 percent. For the 1 percent level, the 
limits might be taken as 14 percent and 2 percent. As a rule, the tabular 
probability is an underestimate: that is, by using the ordinary F and t 
tables we tend to err in the direction of announcing too many significant 
results. 


Cochran goes on to point out that the one-tailed é test is 


4 Following Fisher (1936), gı was used to measure the skewness of each 
of the populations investigated. The values gi/so, obtained were —2.518, 
—3.516, and —14.686 for the 3 nonnormal populations. Since a value of 
91/85 equal to 1.96 may be regarded as significant at the 5 per cent level, 
degrees of freedom equal to ~, these populations all represent significant 
degrees of skewness. 

5 Pearson (1931), Bartlett (1935), Hey (1938). 

§ Cochran (1947), p. 24. 


oo 


HETEROGENEITY OF VARIANCE AND THE Í TEST 167 


more likely to be influenced by departures from normality than 
the two-tailed test. In the one-tailed test, taking the probability 
as one half that of the tabled value may seriously overestimate or 
underestimate the true probability when nonnormality is present.’ 


5. TESTING THE HYPOTHESIS OF A COMMON POPULATION 
MEAN WHEN n, AND ng DIFFER 


Let us suppose that a significant value of F has been obtained 
in an experiment. . The experimenter would then reject the 
hypothesis of a common population variance. We assume that 
he has also applied the ¢ test and that this has resulted in the 
rejection of the hypothesis of random sampling from a common 
population. The experimenter now knows that his 2 samples 
are from different populations (from the £ test), and he also has 
evidence that the variances differ significantly (from the F test). 
But his primary interest is in the difference between the 2 means. 
Can he test the hypothesis that the 2 samples are from normal 
populations with a common mean, irrespective of the variances? 
Several approximate tests of this hypothesis have been suggested. 

If n, and ne differ and if the variances of the 2 groups are 
significantly different as indicated by the F test, a test of the 
hypothesis that the population mean difference is zero may be 
made by an approximation method suggested by Cochran and 
Cox (1944) and described by Snedecor (1946a). Instead of 
pooling the sums of squares from the 2 samples and the correspond- 
ing degrees of freedom, we compute the variance of each mean 
separately by means of formula (4). The standard error of the 
lated by means of formula (45). 


difference is then calcu 

To illustrate this procedure, suppose that we have one 
cases and another sample with 20 cases and that 
e 28.42 and 6.72, respectively. The 
e shall assume is 4.6. Then 
the table of F we find this 


9 degrees of freedom. The 


sample with 10 
the values of the variances ar 
difference between the 2 means W 
F = 28.42/6.72 = 4.23, and from 
is a significant value for 9 and 1 


is di: i i i 1943) 
7 f skewness is discussed in the article by Festinger as 
in T. s E test of significance for means from such populations. 


8 Snedecor (1946a), pp- 83-84. 


168 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


hypothesis of à common population variance must be rejected. 
Let us now test the hypothesis of a common population mean. 
From formula (4) we find that 
28.42 6.7. 
2 T. 2 


42 
20 


ge 
[] 
B 
a 

£ 
ll 


Then the standard error of the difference between the 2 means 
will be given by formula (45). Thus, 


.42 , 06.72 
ET pen uy = V2842 + 380 = V3.I78 = 1.783 


The obtained value of ¢ will now be given by 
(X, - X; -0 4.6 


6. OBTAINING THE VALUE OF t WHICH WILL BE REGARDED AS 
SIGNIFICANT 


To determine whether the obtained value of ¢ is significant 
at the 5 per cent level, we must first find the tabled value of t at 
the 5 per cent level for degrees of freedom equal to nı — 1 and 
for ng — 1. For the first group with 9 degrees of freedom, this 
value is 2.262, and for the second group with 19 degrees of freedom, 
the tabled value of ¢ at the 5 per cent level is 2.093. These 2 
values, which for convenience may be called ¢; and tz, are then 
substituted with the corresponding variances of the means of the 
2 samples in formula (55) below. Thus, 


= (ss?) (h) + (ss?) (to) 
Be," ae? 
The value of t obtained from formula (55) above will be the 
required value of ¢ which will be regarded as significant at the 
5 per cent level? Substituting, we find that this will be 
ER (2.842) (2.262) + (.336)(2.093) 7.132 
Us 2.842 4- .336 ~ 3.178 


tos (55) 


2.244 


9 Tt would be possible, of course, to use the tabled values of t at the 1 
per cent level in formula (55) if we wished to find the value of ¢ which would 
be regarded as significant at this level. 


pg 


p 


HETEROGENEITY OF VARIANCE AND THE Í TEST 169 


The hypothesis that the 2 samples are from populations 
with a common mean is then evaluated by comparing the obtained 
value of t, which is 2.58, with the approximate value required for 
significance, which is 2.24, at the 5 per cent level derived from 
formula (55).!° Since the observed value of 2.58 exceeds 2.24 the 
value significant at the 5 per cent level, the hypothesis tested 


would be rejected. 
Since both F and the value of t obtained above, which did 


not involve any hypothesis about the population variance, are 
significant, the evidence from the 2 tests points to the interpreta- 
tion that the populations from which the samples have been 
drawn differ significantly with respect to both variances and 


means. 


7. THE INFLUENCE OF HETEROGENEITY OF VARIANCE UPON 


THE t TEST 

It was mentioned before that a significant difference in 
y enhance the value of ¢ obtained in the usual way 
f squares and degrees of freedom to arrive 
The example described above may be used 

The sum of squares within each of 
ned by multiplying each variance by 

. = 5 n uni " s? 

the corresponding value of m — l, since Lx = (n — 1)(s"). 
For Group 1, we have, then, D 21" = (9) (28.42) = 255.78, and 
for Group 2, we have Lar = (19) (6.72) = 127.68. Then, 


variances ma; 
by pooling the sums 0: 
at an estimate of s". 

to illustrate this situation. 
the 2 groups may be obtai 


46 = 3.22 


a gu Le 
255.78 + y ( lx d ) 
ete 

( 10 + 20 -2 10 20 
e value of t obtained above is enhanced com- 


when the standard error of the 
f formula (45). 


and we see that th 
pared with the value obtained 
difference was found by means 0 
in correspondence, has pointed out that formula 
mate both the classical and nonclassical ap- 
under the conditions stated. The values 
a little more conservative than the tabled 
h are based upon a nonclassical ap- 


10 Professor Cochran, 5 
(55) was constructed to approxi 
proaches to the distribution of » 
given by the formula are genera y 1 
values of Fisher and Yates (1943), whic! 
proach. 


170  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


When the 2 variances differ significantly, as shown by the 
F test, it may sometimes happen that the value of ¢ obtained by 
dividing the difference between the means by the standard error 
of the difference obtained by means of formula (45) is not sig- 
nificant, whereas the value of t obtained by dividing the difference 
between the means by the standard error of the difference obtained 
by pooling the sums of squares and degrees of freedom, formula 
(48), is significant. This, of course, would be interpreted as 
evidence that the samples have not been drawn from a common 
population, but the evidence would point only to a difference in 
variances, not a difference in means. 


8. TESTING THE HYPOTHESIS OF A COMMON POPULATION 
MEAN WHEN n, EQUALS ng 


If the number of cases in each group is the same, so that 
nı = na = n, then the standard error of the difference based 
upon the pooled sum of squares, formula (48), will be exactly 
equal to the standard error obtained by means of formula (45). 
With equal n’s the 2 are algebraically identical. Then, if F 
is significant and if ny = n = n, and if we wish to test the hypothe- 
sis of a common population mean, the / test may be performed in 
the usual manner by pooling the sums of squares. The value of 
t which will be regarded as significant, however, will be that ob- 
tained by means of formula (55). But since ny = ng = m, it 
will also be true that ¢ will equal £4. Under this condition the 
value of ¢ which will be regarded as significant will simply be the 
tabled value for n — 1 degrees of freedom. 

Thus, if F is significant and the hypothesis of a common 
population mean is to be tested, perform the ¢ test in the usual 
manner, but enter the table of ¢ with one half the number of degrees 
of freedom that would ordinarily be available. If the obtained 
value of ¢ equals or exceeds the tabled value for degrees of freedom 
equal to n — 1, the hypothesis that the samples have a common 
population mean would be rejected.!! 


Hn For another approach to the problem of heterogeneity of variance, 
when the interest is in a difference between two means, see Lewis (1948), p. 
196. 


f. 
g 


HETEROGENEITY OF VARIANCE AND THE Í TEST 171 


9. EXAMPLES 
1. Here is an easy set of scores for practice in calculation: 
Control Group Experimental Group 
11 15 4 15 10 10 
11 10 4 3 T 12 
10 8 8 13 , 6 14 
12 10 9 9 1 8 
8 8 12 9 5 5 


(a) Test for homogeneity of variance. (b) Find the value 
oft. (c) What value of ¢ will be required for significance at the 
5 per cent level? 

2. Here is another set of scores for practice: 


Control Group Experimental Group 


12 10 16 11 15 4 10 15 

8 13 10 10 12 4 T 3 
12 11 11 9 10 8 6 13 
11 9 10 9 5 9 1 9 
ll 11 9 7 8 12 5 9 


(a) Test for homogeneity of variance. (b) Find the value 
oft, (c) What value of t will be required for significance at the 
5 per cent level? 

3. Twenty-five rats were deprived of food for a period of 
22 hr. before a test trial, but were permitted to satisfy thirst 
immediately before the test trial. Another group of 25 rats was 
deprived of food for 22 hr. and was permitted to satisfy thirst 12 
hr. before the test trial. The mean number of responses in the 
test trial for the first group was 21.36, and the value of s? was 


147.57. In the second group, the mean was 32.92, and the vari- 
(a) Test the hypothesis that the 2 


from a common population by means 
hypothesis that the 2 samples are 
variance. (c) What value of 
at the 5 per cent level, if our 
e means? Data are from 


ance was equal to 489.91. 
groups are random samples 
of the t test. (b) Test the 
from populations with a common 
t will be required for significance 
interest is in the difference between th 
Kendler (1945). 

4. A group of 23 
members were given an a 


businessmen and a group of 18 union 
ttitude test toward Russia. The items. 


172  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


in the test consisted of factual and nonfactual statements. The 
subjects were told that the tests were information tests, that they 
would probably not know all the answers to the questions, but 
that they were to guess when in doubt. Scores on the test are 
based only upon the nonfactual statements. The hypothesis in- 
volved is that subjects will tend to guess systematically in the 
direction of their own attitudes on the nonfactual items. High 
scores on this particular test indicate systematic errors in the 
direction of favoring Russia. The mean score for the business- 
men was 6.52, and the value of s was equal to 4.47. The mean 
score for the union group was 16.5, and the value of s was equal 
to 1.96. Make whatever tests are necessary in order to de- 
termine whether the means differ significantly. Data are from 
Hammond (1948). 

5. Scores on a visual-motor test were obtained for a group 
of 70 “control” psychiatric cases with diagnoses other than cerebral 
brain damage. Another group of 70 psychiatrie cases with diag- 
noses of cerebral brain damage was also tested. The mean score 
for the “control” group was 3.5, and the value of s was equal to 
4.8. 'The mean score for the "brain damage" group was 11.6, 
and the value of s was equal to 7.3. Make whatever tests are 
necessary in order to determine whether the means differ signifi- 
cantly, irrespective of variances. Data are from Graham and 
Kendall (1946). 

6. A group of 47 rats was tested in a maze placed in a room 
with temperature at 55 to 58 deg. F. Another group of 46 rats 
was tested in a room with temperature at 75 to 79 deg. F. The 
mean number of trials required to learn the maze for the rats in 
the "cold" room was 19.8 with s equal to 7.36. The mean number 
of trials required for learning in the “normal” room was 25.9 
with s equal to 18.3. Make whatever tests are necessary in order 
to determine whether the 2 means differ significantly, irrespective 
of variances. Data are from Moore (1944). 

7. À group of 78 subjects was taught shorthand by the 
"word" method, and another group of 108 subjects was taught 
by the "sentence" method. At the end of the first semester, both 
groups were tested on a word test consisting of a list of words 
dictated slowly by the instructor and written in shorthand by 


-r 


ag 


Ys. 


HETEROGENEITY OF VARIANCE AND THE t TEST 173 


the students. The mean score of the “word” group was 31.72 
with s equal to 8.01. The mean score for the “sentence” group 
was 35.52 with s equal to 22.93. Make whatever tests are neces- : 
sary in order to determine whether the 2 means differ significantly. 
Data are from Clark and Worcester (1932). 


CHAPTER 10 


An Introduction to the Analysis of Variance 


1. INTRODUCTION 


We have discussed, in the last two chapters, the application 
of the t test to problems involving the significance of the difference 
between the means of 2 independent samples. The hypothesis 
which we were interested in testing was that the samples were 


randomly drawn from a common normal population. We are now: 


ready to consider methods applicable to testing the hypothesis that 
several independent samples have been drawn at random from a 
common normal population. The method which we shall make 
use of is known as the analysis of variance. 

The development of the analysis of variance as a powerful 
tool in experimental and research work is largely the responsibility 
of R. A. Fisher and his co-workers. Fisher, an English statistician 
with training in biological and agricultural research, published the 
first edition of his Statistical methods for research workers in 1925. 
This book, which has gone through many subsequent editions, was 
concerned with the application to problems of research of the more 
recent advances in statistical theory, including the analysis of 
variance. In 1935, the first edition of his Design of experiments 
was published. In this book he dealt more extensively with the 
intimate relationship between experimental design and the analysis 
of variance.’ It was largely through these two books that Ameri- 
can research workers were introduced to the analysis of variance.” 

In commenting upon a paper presented by Wishart (1934) be- 


1 This book has also gone through subsequent editions, the fourth edition 
appearing in 1947. 

? For a number of years Fisher was Galton Professor at the University 
of London. Before this appointment he was at the Rothamsted Experimental 
Station. Many Americans have gone to England to study and work with 
Fisher, and he has made several trips to this country to lecture and teach at 
various universities and institutes of statistics. At the present time Fisher is 
Balfour Professor of Genetics at Cambridge University. 


174 


LA 
24 


AN INTRODUCTION TO THE ANALYSIS OF VARIANCE 175 


fore the Royal Statistieal Society, Fisher had this to say concern- 
ing the early days of the development of the analysis of variance: 


We were together learning how to use the analysis of variance, and 
perhaps it is worth while stating an impression that I have formed—that 
the analysis of variance, which may perhaps be called a statistical method, 
because that term is a very ambiguous one—is not a mathematical 
theorem, but rather a convenient method of arranging the arithmetic. 
Just as in arithmetical text-books—if we can recall their contents—we 
were given rules for arranging how to find the greatest common measure, 
and how to work out a sum in practice, and were drilled in the arrange- 
ment and order in which we were to put the figures down, so with the 
analysis of variance; its one claim to attention lies in its convenience. 
It is convenient in two ways: (1) because it brings to the eyes and to the 
mind a summary of a mass of statistical data in which the logical content 
of the whole is readily appreciated. Probably everyone who has used 
it has found that comparisons which they have not previously thought 
of may obtrude themselves, because there they are, necessary items in the 
analysis. (2) Apart from aiding the logical process, it is convenient, in 
facilitating and reducing to a common form all the tests of significance 
Which we may want to apply. I do insist that its claim to attention rests 
essentially on its convenience. Nearly always we can, if we choose, put 
our data in other forms and other language. Naturally, like other logical 
arrangements, it is based on mathematical theorems previously proved, 
and in particular the tests of significance were based on problems of 
distribution the solution ef which was published for the most part from 


1921 to 1924.3 


That the analysis of variance has proved to be not only a 
d, as Fisher says, but also a powerful method of 
orker is testified to by the extent to 
planning, design, and analysis of 


convenient metho 
analysis for the research w 
which it is being used in the 
research in a variety of fields.* 


3g 4), p. 52. " 

a oeph owed rcm of the analysis of variance were concerned 
primarily with agricultural research, and it may prove difficult for the social 
Scientist to translate such concepts as "treatments," "levels," “blocks,” 

ture into terms applicable to the 


"plots," ncepts used in agricultu 
Ere E in othe fields. ‘The article by Baxter (1941) does much 


to clarify these concepts in terms of psychological research. Lindquist. 


job for educational research. See also the articles 
(1940a) has done the same 10> Cr mt (1944, 1948), Shen (1940), Brozek and 


by Garrett and Zubin (1943), Gra 
Alexander (1947), Jackson (1940), and Dunlap (1940). 


176 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


2. THE PARTITIONING OF THE TOTAL SUM OF SQUARES FOR 
r RANDOM SAMPLES OF n CASES 


Let us start our treatment of the analysis of variance with 
random sampling from a common normal population. Let us 
suppose that a variable X is normally distributed in the population. 
From this population r random samples with n cases each are 
drawn. Then the total number of cases in the samples will be 
nr — N. Letthe mean of all nr cases be equal to m and the mean 
of a single sample be equal to X;. Then x will represent a devia- 
tion of a value of X in a given sample from m, and z; will represent 
the deviation of the same value of X from the mean of the sample. 
Let d; represent the deviation of the mean of the ith sample from m. 
In summary, then 

X = a normally distributed variable 
— the number of cases in each sample 
— the number of samples 
= nr = the total number of cases 
— the mean of all N measures 
i = the mean of the ith sample 

=X-m 


MSS 


RBS 
oil 

Pid 
foal 
3 


Then, considering but a single sample, 
z=X—m 
z—2;= X -m — (X — X;) 
z=r+X-m-X+X; 
z= t; + di 
z? = 2? + Qed; + d? 


Now, summating over the n cases in the sample, we have 
ER dE ug n 
P» z =X zj + 2d: 5 x; + nd? 
1 


And since the sum of the deviations of X from X; is equal to zero, 


es 


EG 


33 


" 


AN INTRODUCTION TO THE ANALYSIS OF VARIANCE 177 


we have 


n n 
Ye? = Va? + nd? 
1 1 


For each of the r samples we shall have an expression similar to 
the one above. Summating these over all r samples, we have 


VE’ -YXX: «adj (56) 
I $ 1 1 1 

The first term on the left in (56) above will be equal to 
E (X — m)? and is commonly called the total sum of squares. 
1 


The total sum of squares is thus based upon the deviations of the 
N values of X from the mean of the combined samples. The first 


term on the right will be equal to »» x (X —X;)? and is commonly 


called the sum of squares within groups. The sum of squares 
ased upon the deviations of the various measures 


f the samples of which they are a part. The 


ill be equal to n È (X; — m)? and is 
1 


within groups is b 
from the means o 


second term on the right w 
alled the sum of squares between groups. The sum of 
Squares between groups is based upon the deviations of the means 
of the various samples from the mean of the combined samples. 

What we have accomplished here is essentially the first step 
in the analysis of variance in its simplest form. Given r samples of 
n cases each, the variation of the nr values of X about the mean 
of the combined samples can be found. This is the total sum of 
Squares and it, in turn, can be analyzed into two parts: one part 
which will be based upon the variation within the samples (vithin 
groups), and one part based upon the variation of the sample 
means about the mean of the combined samples (between groups). 


3. THE MEAN SQUARE WITHIN GROUPS 

We have said that X is assumed to be normally distributed ín 
the population. Let us now assume that the entire population 
has been divided into r random samples of n cases each. Then 3» 
will equal N which will equal the total number of cases in the 


commonly c 


178 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


population. Now, as pointed out above, 


EELEE Hnd 

1 1 I d 1 

may be written 
N T n Ki r = 
LX —mP= EK - XP + ny -m* 
a 


Now, let us divide each side of this equation by N or by nr, the 
equivalent of N, to obtain 


XG-«* EXEXQGX-X? ay — am 
$ = i E EIS 1 
N nr nr 


which simplifies to 


em? ESE- i-e 
I NECS 43 


N nr (57) 


Then, since we have assumed that our samples exhaust the 
population, m will be equal to the population mean, and the first 
term on the left in the above expression will be the population 
variance o°. The first term on the right will be the average value 
of the variances of the r samples, each value being based upon n 
cases. We may represent this average value by c,?. The second 
term on the right will be the variance of the means of the r samples, 
based upon n cases each, about the population mean or the mean of 
the sampling distribution. This term will thus be ez. Substi- 
tuting these symbols, we now have 


e? = gg + 03? 
We know also that the standard error of a mean based upon n 


cases may be written o/ Vn. Then the variance of the mean will 


be obtained by squaring c/V/n and substituting this for ez? in the 
above expression. We obtain 


[4 


g?-—g--— 
n 


AN INTRODUCTION TO THE ANALYSIS OF VARIANCE 179 


$ 
EA 
g^ — — = a 
n 
no? sg 
Be Eg 
n n 
c?(n — 1) 3 
n i 
n 
2 
g = [7 
ET 
£ n m 
iie 
2 
Co 
r(n —1) (68) 


From the above expression, we see that the average value of 
the variances of the r random samples of n cases each will be equal 
to the population variance zf the sum of squares of each sample is 
divided by n — 1 instead of by n. Since the samples actually 
used in research work do not exhaust the population, as we as- 
sumed they did here, we can say that the sum of squares of r 
random samples of n cases each, when divided by r(n — 1), will 
provide an estimate s? of the population variance. In the analysis 
of variance, this estimate is usually called a mean square. The 
number of degrees of freedom for the estimate obtained by formula 
(58) will be given by the denominator r(n —1) =rn -r= N — r.? 

4. THE MEAN SQUARE BETWEEN GROUPS 

It may be noted that the last term on the right of expression 


(57) is the variance of the r means of n cases each, and we already 
know that the variance of the means will be 1/n the population 


variance. Thus, 


5 For each set of n observations a mean is calculated, andsdaviations 
are taken from this mean. For reasons given earlier, this results in the loss 
of 1 degree of freedom within each sample. 


180  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


and multiplying by n, we obtain 
r 
n2, (X, — my 
1 


T 


nog = 0 (59) 


But, again, the samples we usually have in research do not exhaust 
the population, and the value of m is not the population mean but 
an estimate of this quantity based upon the combined samples. 
Since the mean based upon all nr measures is calculated and used 
as an estimate of the population mean, 1 degree of freedom will be 


lost here also. Thus, instead of dividing n £ (X; — m)? by r, we 
1 


should divide by the number of degrees of freedom for this sum of 
squares or, in other words, by r — 1. Then this quantity, which 
is also called a mean square, wil also be an estimate s? of the 
population variance c?. 


5. INDEPENDENCE OF THE MEAN SQUARES 


This completes the analysis of variance, in its simplest form, 
except for the test of significance. We have obtained two inde- 
pendent estimates® of the population variance based upon the 
partition of the total sum of squares into 2 parts and a correspond- 


ing partition of the total number of degrees of freedom. This 
partition may be conveniently set up as follows: 


Sum of squares: Total = within groups + between groups 
Degrees of freedom: N-1= N-r + r-l1 


The sum of squares within groups and the sum of squares 
between groups are divided by their associated degrees of freedom 
to obtain the 2 mean squares which are estimates of the population 
variance. The mean square based upon the within-groups sum of 
squares is often called experimental error or the error variance. 
It measures the uncontrolled variation between subjects treated 


5 This will be true if the distribution of X in the population is normal. 
It can be shown that with a normal population, the errors of estimation in the 
means of the r samples will be independent of the errors in the variances 
within classes. See Yule and Kendall (1947), pp. 405, 448. 


AN INTRODUCTION TO THE ANALYSIS OF VARIANCE 181 


alike and thus corresponds to a standard error of a difference as 
used in the ¢ test.” The mean square between groups is a measure 
of the variation between groups of subjects treated differently, i.e., 
between the experimental conditions being compared. The sum of 
squares between groups plus the sum of squares within groups 
must equal the total sum of squares. Thus, the subtraction of the 
sum of squares between groups from the total sum of squares 
leaves the within-groups sum of squares or that portion of the 
total variation which cannot be accounted for by the experimental 
conditions. ‘The mean square within groups is independent of, 
and consequently not influenced by, the mean square between 


groups.® 


6. THE TEST OF SIGNIFICANCE 

Under the conditions of random sampling from a common 
normal population, the mean square between groups and the 
mean square within groups may be expected to differ only within 
the limits of random sampling. The ratio between the 2 mean 
squares will give us /’, the distribution of which was discovered by 


Fisher and which has been tabled by Snedecor (1946a) in the form 


mean square between groups (60) 
T£ "n n 
mean square within groups 


Now in the form of the F test indicated by formula (60) 
above, if the mean square in the numerator is signifieantly larger 
than that in the denominator, the hypothesis of random sampling 
from a common population will be rejected. This would indicate 
that the means of the samples differ more than can reasonably be 
expected in random sampling from a common normal population. 


If the probability attaching to F is .05 or less, we would say that 


e to refer to any sum of squares which is assumed 


to represent uncontrolled variation as 2 residual sum of Squares. 'The varianee 
denA fr = this sum of squares is called a residual variance. Thus, error 
variance and residual variance are often used interchangeably. In this sense 
the mean square within groups is also a residual variance. 

be described later, it may be necessary to 


8 in limited cases, to 
ain limi , : k 
Ta aor of measurement in order to assure the independence of 


the 2 mean squares This will be true when X is not normally distributed. 


7 Tt is common practic 


182 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


the means differ significantly, and if the means correspond to dif- 
ferent experimental conditions, we would be inclined to attribute 
the observed differences to the experimental conditions? If the 
value of the numerator of the F ratio is smaller than that of the 
denominator, it would, of course, not be necessary to compute 
the value of F.!9 

Since the hypothesis that the research worker is interested 
in testing will involve placing the mean square between groups in 
the numerator of the F ratio, and since the hypothesis will be 
rejected only if the numerator is significantly larger than the 
denominator, the tabled values of F (Table VIII, Appendix) are 
those which are convenient for this case. The tabled values are 
the 5 and 1 per cent points of the F distribution and involve only 
the area in the right tail. They correspond to a one-tailed test of 
significance, not a two-tailed test. 


7. AN ANALYSIS OF VARIANCE FOR 2 GROUPS 


Let us first illustrate the application of the analysis of 
variance to the case of 2 independent groups and show the relation 
between the F and t tests in this situation before considering the 
case of several independent groups. We may take the data 
previously evaluated by means of the t test, the data of Table 18, 
concerning the retention of subjects after a 4-hr. and after an 
8-hr. interval. Let us use the symbol m to represent the mean of 
all N cases, and the other symbols will have the same meaning as 
before. The formulas given below are those which are convenient 
for calculating the necessary sums of squares. 

SSS ie eS 


? Assuming, of course, that proper experimental controls have been 
provided. 

10 Tt is possible to test the hypothesis that the mean square between 
groups is significantly smaller than that within groups, and a method of 
making this test will be shown later. It is difficult, however, to understand 
what interest the experimenter would have in this hypothesis and what 
reasonable experimental interpretation could be placed upon a significantly 
small value of F. Snedecor (1946b) states that he has occasionally observed 
such a small value, but that in every instance it "seemed to be one of those 


unusual ones that occur occasionally in sampling from a homogeneous popula- 
tion." 


AN INTRODUCTION TO THE ANALYSIS OF VARIANCE 183 


N 
(x xy 


1 


N N 
Total = 3; (X -m? =X? - 
i 1 


= (12)? + (16)? + (7)? +- 


(380)? 
2 — — 
+ (3) 

= 4,118 — 3,610 


= 508 
n N 
a x 2x)? (x)? 
Between = x n(X; - m} = È L ex - 
(220) , (160)? _ (880)? 


NE 20 20 
= 3,700 — 3,610 
= 90 
LA = r n " È x)? 
Within = E Y. (X -Ey-LY Eu 


The sum of squares within groups may, of course, be obtained 
the sum of squares between groups from 
Its direct calculation, however, provides 
The sums of squares, the degrees of 
freedom corresponding to the sums of squares, the mean squares, 
and the value of may all be presented as in Table 19, which 
summarizes the analysis. F is obtained by dividing the mean 
square between groups by the mean square within groups. Thus, 


most easily by subtracting 
the total sum of squares. 
a check on the arithmetic. 


F = 90/11 = 8.18. 
is tested by F is that the 2 groups are random 
ea ni We are interested 


8 common normal population. 
nae peau f obtaining an F as large as 8.18 under the 


in t bility o à 
Ead If this probability is .05 or less, we shall con- 


184  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


TABLE 19. Summary of the Analysis of Variance of the Retention Scores of 
a Group Tested 4 Hr. after Learning and a Group Tested 8 Hr. after Learning— 
Original Data in Table 18 


Source of Variation Sum of Squares df Mean Square F 
Between groups 90 1 90 8.18 
Within groups 418 38 11 

Total 508 39 


sider the hypothesis untenable. In order to determine this 
probability, we enter the column of the table of F with the degrees 
of freedom corresponding to the numerator of the ratio and find 
the row entry corresponding to the degrees of freedom of the 
denominator. In this problem we have 1 degree of freedom for 
the numerator, and we enter the first column of the table. We 
have 38 degrees of freedom for the denominator, and the tabled 
value of F significant at the 5 per cent point for 1 and 38 degrees 
of freedom is 4.10 and at the 1 per cent point it is 7.35. Our ob- 
tained value of F of 8.18 is larger than the tabled value at the 1 per 
cent point and therefore has a probability of less than .01. We thus 
reject the hypothesis of random sampling from a common normal 
population and arrive at exactly the same conclusion that we 
reached by means of the ¢ test.!! 

When we have but 2 means to be compared and where 
"; = n» = n, as in the present problem, there is a simplified 
method for obtaining the sum of squares between groups. We 
merely take the difference between the two sums, X; X1 — Y; Xs, 
square the difference, and divide by kn, where k is the sum of the 
squares of the coefficients of the sums. Thus, we have 


(2 X1 - DX)? 
k 
and substituting the 2 sums 220 and 160, we obtain 
(220 — 160)? (60)? 
(2)(20) | 40 


Between groups — (61) 


Between groups 


11 Tt may be shown that in this ease, where the numerator of F has but 
A single degree of freedom VF -t. 


EE 


AN INTRODUCTION TO THE ANALYSIS OF VARIANCE 185 


The value of k is obtained in this way: The coefficients of the 
sums, 220 and —160 are +1 and —1. Squaring +1 and —1 and 
summing, we obtain the value of k = 2. This method of obtaining 
the sum of squares between groups when only 2 means are to be 
compared and when n; is equal to ns is extremely rapid and con- 


venient. 


8. SOME PROBLEMS TO WHICH THE ANALYSIS OF VARIANCE 
MIGHT BE APPLIED 


The very great value of the analysis of variance and the test 
of significance based upon the F distribution is not in its application 
to the problem of 2 sample means, but in problems where the 
differences among a set of several means are to be evaluated. 
Such problems occur frequently in research, partieularly in the 
exploratory stages of an investigation. For example, we may be 
interested in the influence of differing intensities of illumination 
upon reading speed or eye fatigue. If we were to limit the in- 
vestigation to à comparison of but 2 different intensities of illumina- 
tion, how should we decide which 2 to investigate? We should 
probably take 2 intensities which were fairly well separated, but 
with the analysis of variance we could include various degrees of 
intensity between these two extremes in our investigation. 

: Similar problems would be faced in the investigation of the 
effect of different sizes of type upon legibility ; the influence of 
different periods of food deprivation upon learning; the influence 
of different numbers of reinforcements upon conditioning and 
extinction; the influence of different methods of instruction upon 
achievement; the influence of different sets of verbal instructions 
upon problem solving; the influence of different sensory cues upon 
maze learning; the influence of different kinds of motivation upon 
performance; the influence of different periods of rest upon fatigue; 
the influence of different kinds of interpolated activities upon 
learning; the influence of different, kinds of work situations upon 
production and fatigue; the influence of different periods of prac- 
tice upon learning. Many other problems could be enumerated, 
but the ones listed should indicate something as to the utility of 


the analysis of variance in its simplest form. 


186 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


9. AN EXPERIMENT INVOLVING 5 EXPERIMENTAL CONDITIONS 


To illustrate the application of the analysis of variance to a 
problem involving several experimental conditions, let us suppose 
that 40 subjects have been divided at random into 5 groups of 8 
subjects each. These groups are then assigned at random to 
different sets of experimental conditions. The outcomes of this 
hypothetical experiment are given in Table 20. We proceed as 


TABLE 20. Outcomes of a Hypothetical Experiment in Which 40 Subjects Were 
Divided at Random into 5 Groups of 8 Subjects Each—the Various Groups Are 
Then Assumed to Have Been Tested under Different Experimental Conditions 


Experimental Conditions 


Subjects 

1 2 3 4 5 

1 14 16 2 7 5 

2 16 7 10 10 9 

3 3 10 9 10 10 

4 10 4 13 13 7 

5 9 7 ll 3 12 

6 10 23 9 1i 17 

7 21 12 13 7 14 

8 17 13 9 11 22 
Sum 100 92 76 72 96 436 
> x? 1,472 1,312 806 718 1,368 5,676 


before to calculate the necessary sums of squares from the data of 
the table. 


Total = (14)? + (16)? + (3? +--- + (22)? — Es 


= 5,676.0 — 4,752.4 
= 923.6 


(100) , (92)? (6)? (96)? (430)? 
Ey Tu Pes E ns 
= 4830.0 — 4,752.4 


= 77.6 


Between = 


AN INTRODUCTION TO THE ANALYSIS OF VARIANCE 187 


2 2 2 
Within = 1,472 — Sm + 1,312 a ee ios po 


= 222 + 254 + 84 + --- + 216 
= 846.0 
As a check upon the arithmetic, we have: 
Total — between = within 
or 
923.6 — 77.6 = 846.0 


The results of these calculations are given in Table 21 where 
the analysis of variance is completed. The value of F as obtained 


TABLE 21. Analysis of Variance of ihe Outcomes of an Experiment in Which 
& Groups of Subjects Were Tested under Different Experimental Conditions— 
Original Data in Table 20 


Ő 
F 


Source of Variation Sum of Squares df Mean Square 
Between groups 77.6 4 19.40 
Within groups 846.0 35 24.17 
Total 923.6 39 
MLL OLM 
by formula (60) will obviously not be significant, and the hypothe- 
mmon population would be 


sis of random sampling from a co ‘ 
regarded as tenable. As a matter of fact the values of X in Table 


20 were actually drawn at random from a common normal popula- 
tion, Values of X corresponding to a normal distribution were 
placed on disks, and the disks were then thoroughly mixed in a box. 
From the box a single disk was drawn. The value of X on the disk 
was recorded, and the disk was replaced in the box. The disks 
mixed, and another single one was drawn from the box. 
This process was continued until 8 disks had been drawn. The 
values on these 8 disks correspond to the values of X for Experi- 
mental Condition 1 in Table 20. T] he values for the other experi- 
mental conditions were obtained in the same way. Thus, we have 
no basis for expecting the means of these samples to differ signifi- 
n 20 chance has occurred in the sampling. 


were again 


cantly, unless a 1 i 


188  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


These samples are, in fact, drawn from a common normal 
population. 

We may easily alter the data of Table 20 in order to illustrate 
what might have been obtained if the 5 groups of subjects had 
actually been subjected to different experimental conditions and 
if these conditions had influenced the means observed. Let us 
assume that the mean for Experimental Condition 1 will be 2 
points higher and that this increase is made by adding 2 points to 
each of the values for Condition 1. A similar increase has been 
made in the values for Experimental Condition 5. The mean for 
Experimental Condition 4 has been lowered by 2 points by sub- 
tracting 2 points from each of the values for this condition. The 
altered data are given in Table 22. 

TABLE 22. Hypothetical Scores of 40 Subjects Divided at Random into 5 
Groups of 8 Subjects Each—the Groups Are Then Assumed to Have Been 
Tested under Different Experimental Conditions 


————— ee ee enn 
Experimental Conditions 


Subjects 

1 2 3 4 5 

1 16 16 2 5 7 

2 18 7 10 8 11 

3 5 10 9 8 12 

4 12 4 13 11 9 

5 1l 7 1 1 14 

6 12 » 23 9 9 19 

7 23 12 13 5 16 

8 19 13 9 9 24 
Sum 116 92 76 56 112 452 
bb dd 1,904 1,312 806 462 1,784 6,268 


eee 


Proceeding, as before, to calculate the necessary sums of squares, 
we have 


Total = (16)? + (18)? + (5)? +--- + (24)? — (452)? 


40 = 1160.4 
(116)? (92)? (112)? (452)? 
e $ p deed E 3144 


Within — total — between — 1,160.4 — 314.4 846.0 


D» 


df 


AN INTRODUCTION TO THE ANALYSIS OF VARIANCE 189 


It may be observed that the sum of squares within groups 
has not been changed by the addition or subtraction of a constant 
from the scores within a given group. If we had subtracted or 
added exactly the same constant for every value of X, all the sums 


of squares would have remained exactly the same as in the previous 


analysis. 'This fact may be used to facilitate the computations 
when values of X are recorded in terms of several figures. Sub- 
tracting the same constant from all N measures will have no 
influence upon the subsequent application of the analysis of 


variance. 
The results of our calculations are summarized in Table 23. 


TABLE 23. Analysis of Variance of Hypothetical Scores of 6 Groups of Subjects 
Tested under Different Experimental Conditions—Original Data in Table 22 


e—a 


Source of Variation Sum of Squares df Mean Square F 
Between groups 314.4 4 78.60 3.252 
846.0 35 24.17 


Within groups 
1,160.4 39 


Total 


We see that the value of F is now 3.252. From the table of F we 
find that for 4 and 35 degrees of freedom a value of F equal to 
3.252 will have a probability of less than .05 and hence may be 
regarded as significant. If the data of Table 22 had actually 
represented the results obtained under different experimental 
conditions, the inference would be, since the hypothesis of random 
sampling from a common normal population is rejected, that the 
experimental conditions have produced significant differences in 


the means of the various groups. 


10. SUMMARY OF THE CALCULATIONS 


In the illustrations cited, the analysis of variance has been 
applied to groups with the same number of cases in each group. 
This is not a necessary condition, We may have differing numbers 
of observations within each of the groups and still partition the 
total sum of squares and the degrees of freedom. A convenient 
method of arranging the data, whether the n’s are equal or different, 


190 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


is shown in Table 24. There also we have given a summary of the 
necessary calculations and the partitioning of the degrees of 
freedom for easy reference. For convenience we shall indicate the 


TABLE 24. Summary of the Calculations in a Two-Part Analysis of Variance 


Subjects Group 1 Group 2 . Group j . Group r 
1 Xu Xi s Xy a Xi, 
2 Xn Xs : Xz; . Xor 
3 X31 X32 . Xz; . X» 
i Xin Xi» : X . X 
n Xu Xn2 . Xu : X 
Sum >, Xu px e SUUS : AS 
——$ $$ eT 
Computations: 
1. Total sum of scores = 5, X = X Xa +E Xa FX, 
xy? 
2. Correction for origin — 2n 


2 
3. Total sum of squares = 57 X? — Qux» 


4. Sum of squares between groups — 

X.1)? X.2)? Rien? X)? 
QXxa* Qus (xot ow 

ny n2 Ny N 

5. Sum of squares within groups = total — between groups 

Degrees of freedom: 
1. Between groups = r — 1 
2. Within groups =N —r 


3. Total E PS 
Ie ci T o 


individuals by the subscripts 1, 2, 3, . . 5 5 ...,n, and the groups 
by the subscripts 1, 2, 3,.. . j,- .., r. Thus, X indicates the first 
subject in Group 2, and, in general, X ij Will indicate the ith subject 
in the jth group. The summation of X written without any sub- 
script refers to the summation over all N cases. The limits of the 
other summations should be clear from the table, 


Y 


Ss f^ 


AN INTRODUCTION TO THE ANALYSIS OF VARIANCE 191 


11. EXAMPLES 

1. Here is an easy example for practice. Assume that sub- 
jects were assigned at random to Groups A, B, and C, and that 
each group was tested under a different experimental condition. 
The data are as follows: 


Group A Group B Group C 
27 22 37 
45 24 38 
44 42 25 
31 41 47 
38 31 23 


Test the hypothesis that the samples are from a common popu- 
lation. 

2. In the Morgan (1945) experiment, Example 4, Chapter 8, 
t was used to evaluate the difference between the means of the 
"unrestricted hypothesis" group and the “restricted hypothesis” 
group. Using the same data, find the mean square between groups, 


the mean square within groups, and the value of F. 
3. Here is an easy set of measurements for practice. Values 


of X were drawn at random from a sampling box containing a 
normal distribution. Test the hypothesis that the samples were 


randomly drawn from a common population. 
Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 


Sample 1 
9 9 6 6 12 10 
12 10 12 9 6 6 
7 8 9 12 8 8 
14 3 9 7 7 12 
5 7 8 5 3 9 
8 8 10 5 13 4 
7 5 2 8 7 8 
7 9 10 9 13 7 
8 3 9 6 6 6 
3 8 12 3 6 2 


4. Subjects were assigned at random to one of 3 groups. 


Three sets of instructions were prepared explaining how to put 
together a simple piece of apparatus. Measures are in terms of 
speed of assembly. Determine whether the means differ signifi- 


cantly by finding the value of F. 


192 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Instructions Instructions Instructions 
I II III 
22 21 32 
35 44 23 
45 35 22 
24 40 41 
43 35 44 
38 22 32 
23 50 18 
30 28 22 


5. Given the following 4 random samples, test the hypothesis 
that they are from a common population. 


Sample 1 Sample 2 Sample 3 Sample 4 
20 24 20 27 
18 22 22 35 
26 25 30 18 
19 25 27 24 
26 20 22 28 
24 21 24 32 
26 34 28 16 

18 21 18 
32 23 25 
23 25 
22 18 

30 

32 


6. Samples were drawn in the manner of Example 3 above, 
but in this case some of the sample means were increased by adding 
a constant to every measure in the sample and other means were 
decreased by subtracting a constant. Test the hypothesis that the 
samples are from a common population by finding the value of F. 


Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 
13 T 12 10 13 
9 4 i 12 6 
8 4 4 9 14 
7 1 9 7 12 
8 10 5 15 13 
6 7 10 14 10 
6 5 2 10 8 
ri 9 8 17 4 
6 5 3 14 9 
10 8 6 12 11 


s 
ect 


D 


wr 


AN INTRODUCTION TO THE ANALYSIS OF VARIANCE 193 


7. The analysis of variance should prove to be an extremely 
useful tool in problems which involve a “test of technique,” ie, 
where the experimenter is not sure that he can reproduce his results. 
Such failures may be the result of inability to standardize and thus 
control the conditions of the experiment. They may also be due to 
unreliable observers or unreliable measuring devices or other 
factors. The problem cited here happens to involve the observers, 
and was carried out under the direction of Loucks (1948). Sub- 
jects were assigned at random to one of four graduate assistants— 
here referred to as Operators A, B, C, and D. The operators did 
not test an equal number of subjects, but each operator observed 
his particular subjects perform under supposedly the same set of 
Records were kept of a number of different variables. 
The one reported here concerns but one phase of the study, the 
errors made in making turns in an airplane trainer. The records 
for each operator for his particular group of subjects are given 


conditions. 


below: 


Subjects Operator A Operator B Operator C Operator D 


m 


= 
WOWRONONNOO 00 


= 


m 
m 

"OUR OoOOu ococuouoou-o0o0m-uo-omw-»o 
me 


Em 
> 
= 

COP AANO N O N 


- 
e 
ee 

t9 e 02 Q2 C*obQ 4 Q0 Q3 HS» 4 Q0 O 4 OQ) O20 4&- Qo Cv 


e tendency for the means and standard 


re is som 
ithe: at al, indicating the desirability of a 


deviations to be proportion 


194  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


logarithmic transformation (see p. 202) of the data, we shall analyze 
the original measures given above. (a) Find the sum of squares 
between groups and within groups and the value of F. (b) Now 
repeat the analysis using only the data for Operators A, C, and D. 
(c) Assuming the validity of the analysis of variance, what inter- 
pretation would you place upon these results? 


^s 
iU 


CHAPTER 11 


Heterogeneity of Variance and 
Transformations of the Scale 


1. INTRODUCTION 


It has been pointed out that the analysis of variance is a 
technique for testing the hypothesis that several samples have 
been drawn at random from a common normal population. If 
F is significant, then this hypothesis is rejected at a defined stand- 
ard of significance. It may appear, however, that the variances 
within the various samples are quite dissimilar, and the experi- 
menter may be worried about conclusions concerning the experi- 
mental means. A supplementary test of the hypothesis that 
the samples are random samples from populations with a common 
variance may be desirable under this circumstance. 


2. THE TEST FOR HOMOGENEITY OF VARIANCE: EQUAL n's 


A test for homogeneity of variance is described by Bartlett 
(1937), and we may show the necessary calculations in the case 
of a problem which we have already treated by the analysis of 
variance. The original data are given in Table 22. We first 
find the sum of squares within each of the 5 groups. For the 


first group, we will have 


(116)? 
M = 2292. 
: 2.0 


(oxy 
22y x? AS = 1,904 - 
ES23X x 


a 

1 The analysis of variance is also used to solve another class of problems: 
the estimation of the components of variation associated with a composite 
population. These problems are discussed in detail by Eisenhart (1947a), 
Crump (1946), and Daniels (1939). : 

2 A test of this hypothesis for the case of but 2 groups has been described 
previously (pp. 163-165). Here we are concerned with several groups. 

195 


196 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


The sums of squares within each of the other groups are found in 
the same way and have been entered in Table 25. In the fifth 


TABLE 25. Bartlett's Test of Homogeneity of Variance for r Groups with n 
Subjects in Each Group 


Group n df x? ie? log s? 
1 8 y 222.00 31.71 1.50120 
2 8 T7 254.00 36.29 1.55979 
3 8 7 84.00 12.00 1.07918 
4 8 7 70.00 10.00 1.00000 
5 8 7 216.00 30.86 1.48940 
Sum 120.86 6.62957 
Computations: 


a2 d 6 2 
pts Es —2417; log € = 1.38328 
La 


2 


2. (r) [s : ] = (5) (1.38328) = 6.91640 


5 
: Le 2 : 
3. Diff. = (r) | log — ¥ logs? = 6.91640 — 6.62957 = .28083 
r 


4. x? = (2.3026) (n — 1)(diff.) = (2.3026) (7) (.28683) = 4.623 


Elo. quoe ol 
G)0o-1 (8)(5)(7) 


6. Corrected x? = x?/correction = 4.623/1.057 = 4.374 


5. Correction = 1 + 1.057 


column of Table 25, we have entered the variance s? for each 
group. This is obtained by dividing the sums of squares by the 
corresponding number of degrees of freedom. In the sixth column 
the logarithms of the variances have been entered. 

Under the hypothesis of random sampling from a population 
or populations with a common variance, the values of s? recorded 
in column 5 of the table are all estimates of the same parameter. 
If the hypothesis is true, then these values of s? should not differ 
any more than is to be expected in random sampling from à 
common population. The test of significance for this hypothesis 
is made by means of x?, and the number of degrees of freedom for 


d 


TRANSFORMATIONS OF THE SCALE 197 


evaluating x? will be 1 less than the number of samples or r — 1. 
The necessary calculations should be clear from an examination 
of Table 25.? 

The value of x? calculated in line 4 of Table 25 is somewhat 
biased in that it tends to exaggerate the significance level. If the 
value of x? as found in line 4 is significant, then the “correction” 
as shown in line 5 should be calculated. Then the “corrected” x 
obtained in line 6 will give a more accurate value for interpretation. 
The corrected x” will always be less than the value obtained in 
line 4. Thus, if the value first found is not significant, there will 
be no need of applying the correction. In the present problem, 
for example, since the x? required for significance forr-1=4 
degrees of freedom is 9.488, there would really be no necessity of 
obtaining the corrected value of x2. It is obvious that the ob- 
tained value of 4.623 is not significant, and since the corrected 
value will be even smaller, it also will not be significant. 

The x? test, applied to the variances of the 5 samples, gives 
support to the belief that these samples are not heterogeneous in 
variance. Since the obtained value of x? is not significant, the 
data offer no evidence against the hypothesis of random sampling 
pulations with a common variance. Since 


from a population or po Sin 
the F test as applied to the same data has resulted in the rejection 


of the hypothesis of random sampling from a common population, 
and since the x” test indicates that it is not the variances which 
differ significantly, we have every reason to believe that the sig- 
nificance of the value of F is the result of differences in the means. 


3. THE TEST FOR HOMOGENEITY OF VARIANCE: UNEQUAL n's 


The test for homogeneity of variance with an unequal number 
of observations within each of the various groups differs but little 
from the test as applied to groups with equal n’s. Asa guide in 
applying the test, an example is given in Table 26, along with the 
necessary calculations. The value of x^, in this instance, will 
also have degrees of freedom equal to 1 less than the number of 
samples on which it is based. Since it 15 obvious here also that 


3 2.3026 in line 4 of the table is loge 10 and takes into account 
the Nee Tare used the more familiar common logarithms (logio) for 
s? instead of natural logarithms (loge). 


198 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


TABLE 26. Bartlet's Test for Homogeneity of Variance for r Groups with 


Unequal n’s 
——————————— 
L 2 2 2 2 
Group n nmn-—i Er Ld logs? (n — 1) (logs?) 


1 21 20 .05000 244.00 12.20 1.08636 21.72720 
2 13 12 .08333 162.00 13.50 1.13033 13.56396 
3 15 14 07143 11000 7.86  .89542 1253588 
4 10 9 —.11ll1 98.00 10.89 1.03703 9.33327 
Sum 55  .31587 614.00 57.16031 
LC aos NR 
Computations: 
2g eu dy 2 
LS -2—— —1116; log = 1.04766 
»m-D 5 t "EST = D 


2. [ZZ  — 1] [io S] = (65) (1.04766) = 57.62130 


3. Diff. = [37 (n — 1)] [196 oe] — X. (n — 1) (log s?) 


= 57.62130 — 57.16031 = .46099 
4. x? = (2.3026) (diff.) = (2.3026) (.46099) = 1.061 


1 1 1 
5. ion = = 
Correction = 1 + loc = zl[zz 1 S cum z] 


1 1 
1+ [3s] [ 1587 = a] 


1 + C11111)(31587 — .01818) = 1.033 
6. Corrected x? = x?/correction = 1.061/1.033 = 1.027 


I 


the value of x? obtained in line 4 is not significant, the application 
of the correction is not necessary. In the case of borderline 
significance of the value of x? obtained in line 4, the final decision 
should be based upon the corrected value as obtained in line 6. 


4. TRANSFORMATIONS 


The question may be raised as to the direction which the 
statistical analysis of the results of an experiment may take when 
the value of x? does indicate Significant differences in the error 
(within groups) mean squares. 'The precautions mentioned earlier 


4 


TRANSFORMATIONS OF THE SCALE 199 


in connection with the ¢ test? when heterogeneity of variance is 
present apply here also. 

The presence of correlation between the variances and 
means within the various experimental conditions is an indication 
of departure from normality, and this is likely to be associated 
with heterogeneity of variance within the several groups. Since 
the P test depends upon the independence of the 2 mean squares 
(that between groups and that within groups), a fundamental 
condition of the analysis of variance would be violated. A simple 
precaution, then, is to examine the data to determine whether 
the means and variances of the separate experimental groups 
tend to be correlated. If this is the case, then a transformation 
of the original data to a new scale may correct the difficulty.? 

The purpose of such a transformation is to change the scale 
of measurement to one in which the application of the analysis 
of variance is more likely to be valid, i.e., a scale in which the 
variance is more homogeneous. The transformation may also 
have the desirable effect not only of stabilizing the variance, but 
also of decreasing the skewness or nonnormality of the variable. 


5. THE SQUARE ROOT TRANSFORMATION 


In some psychological research, the data are recorded in 
terms of the number of errors made or of the number of correct 
responses made in a limited number of trials. These recorded 
values are then treated as continuous, and the experimenter may 


desire to evaluate the results of the experiment by means of the 
analysis of variance. If the counts are small, the distribution 
may be such that the means and variances tend to be propor- 


tional to one another within the various experimental conditions. 


nM 


4 Pages 165-167. " 
5 pu of the transformations which have proved useful are discussed 

in the sections which follow. Various other transformations are treated in 
i tt (1947). i 

Aniartigleiby Bartle a the extent to which the transformation has 


€ thod o; A memory 
SEE ds variable is to plot the cumulative percentage distribution of 


i ility paper. If a cumulative 
the t d variable upon normal probability pap : 
ba ooaas or pee of a normal distribution is plotted on this paper, the 
result is a straight line. Sheets of this paper may be obtained from most 
college book stores. 


200  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


The transformation recommended by Bartlett (1936) for this 
situation is the square root transformation. 

Instead of taking the actual recorded values of X, the values 
of VX are taken, and the analysis of variance is applied to these 
transformed values. When the means of the various groups are 
extremely small, in the range of 2 to 10, Bartlett suggests taking 
the transformation as VX + .5, particularly when many of the 
recorded values of X are 0. With means smaller than 2.0, the 
transformation and subsequent application of the analysis of 
variance should not be considered. 


6. TWO EXAMPLES OF THE APPLICATION OF THE SQUARE ROOT 
TRANSFORMATION 


The application of the transformation VX + .5 may be 
shown in connection with an experiment by Sleight (1948). 
Sleight was concerned with the legibility of readings of various 
dial types. Five different dial types were investigated: horizontal, 
open window, round, vertical, and semicircular. For purposes 
of illustration, we have taken here only the records for the round, 
vertical, and semicircular dial types. We shall also assume that 
the subjects, whose records are available for each dial type, were 
assigned at random to the 3 experimental conditions. The data 
recorded were the number of errors made by each subject, and 
these are shown in Table 27. 


TABLE 27, Errors (X) Made by 3 Groups of Subjects in Reading 3 
Different Dials 


Round Vertical Semicircular 

2 6 4 

2 6 2 

0 10 6 

4 12 4 
ER 6 a 

Sum 11 40 23 
Mean 2.2 8.0 4.6 
Variance 2.2 8.0 3.8 


a 


ip 


Seay 


TRANSFORMATIONS OF THE SCALE 201 


For each group of 5 subjects we have found the mean and 
the variance (mean square), and these are also given in Table 27. 
It is clear that the means and variances of the original data are 
proportional and highly correlated. It is also apparent that the 
variances tend toward heterogeneity. Let us see what effect 
a transformation of the data from the original scale to a scale of 
V X + .5 will have upon the variances. 

In Table 28, we have recorded the values of VX + .5 for 
and the means and variances of the transformed 
d and entered at the bottom of the table. 
that the means and variances are no 


the original data, 
data have been compute! 
It is perfectly clear here 


TABLE 28. The VX + .5 Transformation of the Error Scores Made by 
8 Groups of Subjects in Reading 3 Different Dials 


Round Vertical Semicircular 

1.58 2.55 2.12 

1.58 2.55 1.58 

71 3.24 2.55 

2.12 3.54 2.12 

1.87 2.55 2.74 

Sum 7.86 14.43 11.11 
Mean 1.57 2.89 2.22 
28 22 .20 


Variance ; 


tional and that the heterogeneity of the variance 
has been reduced. The analysis of variance could now be ap- 
plied to the transformed data. By stabilizing the variance 
through the transformation of the data to the new scale of VX +.5, 
we have eliminated the correlation between the means and the 
variances, and hence, the mean square between groups and the 
mean square within groups will be independent. 

‘An illustration of the square root transformation may also 
be given for the results of an experiment cited by Bartlett (1947). 
This study was concerned with weed infestation under 6 different 
treatments, and the data reported are weed-infestation counts 
for 4 replications of the treatments (4 observations for each condi- 


tion). These data are given in Table 29. 


longer propor 


202  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


TABLE 29. Weed Infestation Counts for 6 Experimental Treatments with 4 


Replications 
Experimental Treatments 
a) (2) (3) (4) (5) (6) 
438 538 7T 17 18 115 
442 422 61 31 26 57 
319 377 157 87 1% 100 
380 — 315 52 — $16 — 20 s 
Sum 1,579 1,652 347 151 141 317 
Mean 395 413 87 38 35 79 
Variance 3,353 8,869 2,300 1,125 786 1,126 


It may be observed here also that the means and variances 
tend to be proportional. The rank order of the means and vari- 
ances for the 6 treatments is identical, and the two are therefore 
highly correlated. If we transform the data by taking the square 
root of the various counts, we obtain the means and variances 
given in Table 30. 


TABLE 80. Square Root Transformation of the Weed Infestation Counts 


of Table 29 
———————MÀ—Á— 


Experimental Treatments 


(1) (2) (3) (4) (5) (6) 

20.9 23.2 8.8 4.1 4.2 10.7 

21.0 20.5 7.8 5.6 5.1 7.5 

17.9 19.4 12.5 9.3 8.8 10.0 

19.5 17.7 7.2 4.0 4.5 6.7 

Sum 79.3 80.8 36.3 23.0 22.6 34.9 
Mean 19.8 20.2 9.1 5.8 5.6 8.8 
Variance 2.1 5.3 5.7 6.2 45 3.7 


——— —— — 


The variances of the transformed data are obviously much 
more homogeneous than the variances of the original data. The 
application of the analysis of variance to the transformed data 


would thus be much more plausible than in the case of the original 
data. 


7. THE LOGARITHMIC AND ANGULAR TRANSFORMATIONS 


Another situation in which heterogeneity of variance may 
be observed is that in which the standard deviations of the obser- 


TRANSFORMATIONS OF THE SCALE 203 


vations in the various groups tend to be proportional to the cor- 
responding means. In this case a transformation to a logarithmic 
scale is recommended. When zero counts are present, the trans- 
formation may take the form of log (1 4- X). 

A transformation has also been suggested for data recorded 
in terms of percentages. In an experiment by Child (1946), for 
example, groups of children were observed under a number of 
different experimental conditions, and the percentage in each 
group choosing a distant goal instead of a near goal was recorded. 
'The transformation which would be applicable to this case is the 
inverse sine or angular transformation. 

The angles corresponding to various percentages have been 
tabled by Bliss (1937), but this reference is not readily available. 
Bliss’s table has been reproduced, however, by Snedecor (1946a).° 


8. THE RECIPROCAL TRANSFORMATION 

In studying the influence of varying amounts of incentive 
upon speed of running in 2 groups of 7 rats each, Crespi (1942) 
found a significant difference in the variances for a l-unit and 
a 4-unit incentive group. F = s;?/s?? was 72.4 when time was 
used as a measure of performance. For the 6 and 6 degrees of 
freedom available, a value of F equal to 8.47 is significant at the 
2 per cent level. Because of this, and for other reasons which he 
discusses in some detail, Crespi changed his unit of measurement. 
Instead of using the time required to run the path, he took the 
reciprocals of the time. With this transformation the variance 
within the 2 groups was stabilized, as indicated by the F ratio 
of 1.06. : 
A. reciprocal transformation such as used by Crespi may 
prove to be useful in other psychological studies. This may be 
the case, for example, in word-association or reaction-time experi- 
ments or in studies of problem solving where one of the variables 
recorded is time of reaction or time to completion. 

9. THE ANALYSIS OF VARIANCE WITHOUT TRANSFORMATION 

It is not intended to give the impression that conclusions 
drawn from the analysis of variance applied to original data 


7 Bartlett (1947). " 
8 ul d Yates (1943) and Eisenhart (19475). 


204  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


recorded in terms of number of errors, or number of correct choices, 
or percentages will always be in error. For example, the errors 
for all 5 dial types (we gave only the results for 3 of the dial types 
in Table 27) in the experiment by Sleight (1948) were tránsformed 
by means of VX +.5. An analysis of variance of the trans- 
formed data resulted in exactly the same conclusions that were 
obtained by an analysis of the original data. 

'The analysis of the transformed data, however, resulted in 
an F of 11.92, while analysis of the original data yielded an F of 
8.87 for the mean square for dial types. A value of F equal to 
5.07 would have been signifieant for the number of degrees of 
freedom available. Thus, although no conclusions concerning 
significance were changed by the analysis of the transformed data, 
we see that the probability attaching to the F obtained in the 
second analysis is much smaller than the probability value of the 
F for the original data. 

Child (1946) also reports that the analysis of the percentages 
in his experiment as transformed by the angular transformation 
resulted in no changes in conclusions concerning significance. 

In summary, Snedecor (1946a) has said: 


Much experimental data, however, whether expressed in percentages 
or frequencies, may be safely subjected to variance analysis. Percentages 
which express frequency of occurrence per hundred units when calculated 
from counts of 100 or more in the numerator and when ranging between 
20% and 80%, may be expected to yield valid tests in analysis of variance. 
Similarly, counts..., for example, running into three figures, usually 
offer no difficulties. 

On the other hand, if percentages result from less than 100 affected 
individuals, or if the event enumerated is infrequent, some transformation 
of the variable may be necessary before analysis of variance is carried 
through.? 


10. EXAMPLES 


1. Test the hypothesis of homogeneity of variance for the 
data of Example 4, Chapter 10. 

2. 'Test the hypothesis of homogeneity of variance for the 
data of Example 6, Chapter 10. 


? Snedecor (19464), p. 316. 


E 


PEAD 


jure 


- be proportion: 


TRANSFORMATIONS OF THE SCALE 205 


3. In a problem analyzed earlier (Example 7, Chapter 10), 
we had records available for the performance of 4 groups of subjects 
tested by 4 different operators. The data are repeated here for 


convenience. 


Subjects Operator A Operator B Operator C Operator D 


j 6 5 4 8 
2 4 3 rj 5 
3 3 4 3 6 
4 y 3 7 T 
5 13 3 4 7 
6 9 4 8 9 
7 4 0 7 7 
8 10 3 4 10 
9 8 4 8 4 
10 9 4 4 3 
11 8 3 11 3 
12 5 3 13 
13 5 4 9 
14 10 2 
15 9 5 
16 15 3 
17 10 3 
18 6 1 
19 4 2 
20 5 
ý 


It was pointed out at the time that these data were discussed 
that there was a tendency for the means of the various groups to 
al to the standard deviations, and that a transforma- 
tion would be desirable. (a) Find the means and standard devia- 
tions for each of the groups to determine this for yourself. (b) 
Test the original variances for homogeneity of variance. (c) Now 
transform the data to the logarithmic scale log (X + 1). Find 
the means and standard deviations of the transformed data. Are 
the means and standard deviations still proportional? (d) Per- 
form an analysis of variance with the transformed data and com- 
findings with the analysis of Example 7, Chapter 10. 
experimenter has tested 4 groups of 
the same experimental conditions, but 


pare your 
4. Suppose that an 
subjects under apparently 


206  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


on different days. Let us assume that the means of the 4 groups 
do not vary more than could be reasonably attributed to chance. 
The sum of squares within each group and the n ara given below. 
Test for homogeneity of variance by means of the x? test. What 
possible explanations might be offered for the results obtained? 


Group n za? 
1 23 2,200.00 
2 13 3,888.00 
3 11 4,440.00 
4 21 3,920.00 


5. In the spring of 1939, Newcomb (1943) gave an attitude 
test to students at 3 colleges in which the “attitude-climate” 
was assumed to differ on the issue in question. The colleges were 
Bennington, Williams, and Catholic University. The summary 
data are given below. The sum of squares within any single 
group may be obtained by squaring the standard deviation and 
then multiplying by n — 1. Test the variances for homogeneity. 


Bennington Williams Catholic 
Mean 41.2 43.6 53.6 
s 7.2 9.8 6.8 
n 174 312 83 


6. Suppose subjects were randomly assigned to experimental 
conditions A, B, and C. The outcomes of the experiment are 
given below. 


Condition A Condition B Condition C 
12 0 E! 
12 0 4 
19 2 0 
24 4 8 
12 4 6 
11 5 4 
19 5 5 
22 0 0 
11 T 9 
il 3 z 


(a) Find the means and variances for each group. Note that they 


nd 


L 


TRANSFORMATIONS OF THE SCALE 207 


tend to be proportional. (b) Find the value of F for a two-part 
analysis. (c) Now transform the data to the scale V X + .5. 
Find the means and variances. Has the variance been stabilized 
by the transformation? (d) Repeat the analysis of variance 
with the transformed data and interpret your results. 


CHAPTER 12 


The 2" Factorial Design for Experiments 


in Which Variables Are Varied in 
Only 2 Ways 


1. INTRODUCTION 


Many experiments are concerned with 2 or more variables 
each of which may be varied in several ways. When the variables 
are studied in all possible combinations in the same experiment, 
the experiment is said to be of factorial design. In this chapter we 
shall consider only factorial designs in which each variable under 
investigation is varied in 2 ways. If the experiment involves n 
variables, and if each variable is varied in 2 ways, we shall have 2" 
experimental conditions. 


2. THE 2 X 2 FACTORIAL DESIGN 


As an illustration, let us suppose that we are interested in the 
retention of verbal material. One of our variables concerns the 
mode of presentation of the material and this is varied in 2 ways. 
A passage is read to subjects, and we shall call this the auditory 
mode of presentation. In the other ease, the subjects themselves 
read the passage, and this we shall call the visual mode of pres- 
entation. Another variable concerns the time of testing and this 
is varied in 2 ways also. An "immediate" test is given directly 
after the presentation of the material. The “delayed” test is 
given some time after the presentation of the material, perhaps 
12 or 24 hr. later. 

From the rules for permutations and combinations, it can 
easily be determined that we have here (2)(2) = 4 possible 
combinations of the experimental variables. Each of these 

208 


he 


FACTORIAL DESIGN FOR EXPERIMENTS 209 


combinations will constitute one of our experimental conditions. 
We thus have visual presentation-immediate test; visual presenta- 
tion-delayed test; auditory presentation-immediate test; auditory 
presentation-delayed test. Let us suppose that we have 40 sub- 
jects available and that they are divided at random into 4 groups 
of 10 subjects each. The groups are then assigned at random to 
the experimental conditions. The outcomes of this hypothetical 


experiment are given in Table 31. 


TABLE 31. Retention Scores of 4 Groups of Subjects Tested under 4 
Different Experimental Conditions 


Visual Auditory 
Immediate Delayed Immediate Delayed 

76 36 43 37 
66 45 75 22 
43 47 66 22 
62 23 46 25 
65 43 56 11 
43 43 62 27 
42 54 51 23 
60 45 63 24 
78 4l 52 25 
oo 40 ES E 

47 564 247 


nalysis of the data in a manner already 
familiar. We shall first find the total sum of squares, then the 
sum of squares between the 4 experimental conditions, and finally 
the sum of squares within groups. Proceeding as before, we have 
2 2 2_ (1,829)? 

Total= (16)? -- (66) +(43)?+ + (31) 20 


7? (1,829)? _ 


We begin our a 


= 11,253.975 


2 2 j 

601» (417)* (564)? , (24 1 " 

T TN: 10 T 10 + 10 + 10 40 188.475 
Within = total —between = 11,253.975 —7,788.475 = 3,465.500 


on the arithmetic, the sum of squares within 


k u Mars 
Bande E y by calculating the variation 


groups may be obtained directl 


210  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


within each of the groups about the group means. Thus, for the 
first group, we have 


E m? = (76)? + (66) + (43)? +--+ + (66)? — “eae 


(601)? 
10 
The sums of squares within the other groups may be calculated in 
the same way. Finding each of these sums of squares, we would 
then see that their sum would be equal to the sum of squares 

within groups. Thus, 


1,582.9 + 590.1 + 890.4 + 402.1 = 3,465.5 


The summary of the analysis is given in Table 32. The 
sum of squares between groups and the sum of squares within 


TABLE 32. Two-Part Analysis of Variance of Retention Scores of 4 Groups 


of Subjects 
Source of Sum of 
Variation Squares af MeanjRauare m 
Between groups 7,788.475 3 2,596,158 26.969 
Within groups 3,465.500 36 96.264 
Total 11,253.975 39 


groups have been divided by the number of degrees of freedom 
associated with each to arrive at the mean squares, 2,596.158 and 
96.264, respectively. The value of F may now be determined and 
will be equal to 2,596.158/96.264 = 26.969. From the table of F, 
we find that for 3 and 36 degrees of freedom a value of 4.38 will be 
significant at the 1 per cent point and that our obtained value of 
26.969 is thus highly significant, with a probability much less than 
.01. The hypothesis of random sampling from a common normal 
population would be rejected. 

An examination of the data of Table 31 indicates that the 
means obtained for the various experimental conditions differ. 
If the variances within the several groups may be assumed to be 
homogeneous, then it seems a logical conclusion that it is the 
variation in the means of the experimental groups which results in 
the highly significant value of F which we have obtained. Before 
proceeding with the analysis of variance in this experimental 


AN 


aL 


— EEE 
— i a Eee 


FACTORIAL DESIGN FOR EXPERIMENTS 211 


design, we may examine the variances within groups to see whether 
the hypothesis of homogeneity of variance is tenable. 


' 3. TESTING FOR HOMOGENEITY OF VARIANCE 


Each of the sums of squares within the several groups, when 
divided by the number of degrees of freedom, in this case 9, will 
give us a value of s?. Under the hypothesis of random sampling 
from a common population, each of these values of s? is an estimate 
of the same variance. The values of s”, reading from left to right 
in Table 31, will be 175.9, 65.6, 98.9, and 44.7. It may seem that 
these vary quite a bit to be estimates of a common population 
variance. The important question, of course, is whether they vary 
sufficiently to result in the rejection of the hypothesis of a common 
population variance. 

The method of testing for homogeneity of variance was 
described earlier. The test applied to the data of the present 
experiment is shown in Table 33. The uncorrected value of x? 


TABLE 33. Bartlett's Test of Homogeneity of Variance for Retention Scores 


of 4 Groups of Subjects 


Group n n—1 s? log s? 

1 10 9 175.9 2.24527 

2 10 9 65.6 1.81690 

3 10 9 98.9 1.99520 

4 10 9 44.7 1.65031 

Sum 385.1 7.10768 
Computations: 


, Es _ 385 L 96.28; € = 1.98354 


Es] = (4)(1.98354) = 7.93416 
f 


to 
> 
a 
Ir 
= 
S 
R 


Diff. = (r) L x =| — ¥ logs? = 7.93416 — 7.70768 = 22648 
i B 
a — 1) (dif) = (2.3020) (9) (22048) = 4.693 


e 


4. x? = (2.3026) ( 


212  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


is found to be equal to 4.693, and, since we have 4 groups, we shall 
have 3 degrees of freedom for evaluating this x?. By reference to 
the table of x?, we find that for 3 degrees of freedom a value of 
7.815 will have a probability of .05. Since our obtained value is 
smaller than this, it will not be regarded as significant, and there is 
no need to apply the correction. The corrected x? will have a 
value of P even larger than the value of approximately .20 for the 
uncorrected value of 4.693. Hence, we may conclude that the 
variation of our values of s? is within the limits of random sampling 
from a population with a common variance and that the hypothesis 
of homogeneity of variance is therefore tenable. 


4. PARTITIONING THE SUM OF SQUARES BETWEEN GROUPS 


Let us now return to the analysis of variance of the data of 
the experiment. In a2 X 2 factorial experiment, such as this, the 
sum of squares between groups may be further partitioned into as 
many component parts as there are degrees of freedom associated 
with it. In this problem the sum of squares between groups is 
based upon 3 degrees of freedom, and we may analyze it into 3 
meaningful parts, each part based upon a single degree of freedom. 

One of these sums of squares will be based upon the difference 
between the 2 modes of presentation and will involve a comparison 
of the sums of scores for the visual and auditory modes of presenta- 
tion. Another of the sums of squares will be based upon the dif- 
ference between the times of testing and will involve a comparison 
of the sums of scores for the immediate and the delayed tests. 
The third sum of squares will be based upon the interaction of the 
2 variables, mode of presentation and time of testing. The mean- 
ing of a significant interaction will be discussed in greater detail 
later. For the time being, it may suffice to say that it would 
indicate a differential in response to the auditory and visual modes 
of presentation at the immediate and delayed time of testing. 

Let us see how the sums of squares described above are cal- 
culated. Several methods are available in terms of earlier discus- 
sions. For the mode of presentation, we may combine the scores 
for the immediate and delayed visual conditions and obtain the 
sum, 601 + 417 = 1,018. This sum will be based upon an n of 
10 + 10 = 20 cases. Similarly, if we combine the immediate and 


| 
| 


FACTORIAL DESIGN FOR EXPERIMENTS 213 


delayed scores for the auditory mode of presentation, we obtain 
the sum 564 + 247 = 811. This sum will also be based upon an n 
of 10 + 10 = 20 cases. It is important to note that the 2 sets of 
10 scores in each of these 2 groups have been obtained under 
exactly the same conditions, except for the difference in mode of 
presentation. Ten observations in each group have been obtained 
for the immediate test, and 10 observations in each group have 
been obtained for the delayed test. The only way in which the 
20 observations in the 2 groups differ is that in the one case all 20 
are based upon a visual presentation and in the other all 20 are 
based upon an auditory presentation. 

The sum of squares for mode of presentation may be obtained 


in the usual way. Thus, 
Q Xy, x (LX) 
T N 


ny fi» 


_ (1018)? + (811)? _ (1,829)? 
~ 20 20 40 


= 1,071.225 


We may note here that the correction term (1,829)?/40 is the same 

s that calculated earlier. An alternate method of calculation, 
which may be used whenever the number of cases upon which the 
2 sums are based is the same, is that given earlier by formula (61). 


Thus, 
(SX =D X3) bi (1,018 — 811)? ^ (207)? = 1,071,295 
kn (2) (20) 40 

The sum of squares for time of testing is obtained by combin- 
ing the scores of the auditory and visual immediate tests to obtain 
the sum 601 + 564 = 1,165. This sum will be based upon an n 
of 10+ 10 = 20 cases. Similarly, we combine the scores for the 
auditory and visual delayed tests to obtain the sum, 417 + 247 = 
664. This sum will also be based upon mn of 10 + 10 = 20 
cases. We may note that the only respect in which the2 sets of 10 
observations in each of these 2 groups differ is that of time of 
Ten observations in each group are based upon an 
and 10 in each group are based upon the 


testing. e 
auditory presentation, 


214 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


visual presentation. The 20 observations in each group do differ 
with respect to time of testing, all 20 in one of the groups being 
based upon the immediate test, and all 20 in the other being based 
upon the delayed test. 

The sum of squares between the 2 conditions, immediate and 
delayed time of testing, may be obtained by either of the methods 
used above for the sum of squares for mode of presentation. Thus, 


1,165)? 664)? 1,829)? 
(1,165)? (6601? _ (1,829) 


is B 49 ^7 675.025 
or 5 

1,165 — 664)? (501)? 

(1165 — 664)? _ (501) — 6,275.025 


(2) (20) 40 


5. CALCULATION OF THE INTERACTION SUM OF SQUARES 
BASED ON 1 DEGREE OF FREEDOM 


The interaction sum of squares may be obtained most easily 
by subtraction. It was pointed out earlier that the sum of squares 
between groups (7,788.475) must equal the sum of the 3 sums of 
squares for mode of presentation, time of testing, and interaction 
between these 2 variables. We have calculated directly 2 of these 
sums of squares, namely, that for mode of presentation and that 
for time of testing; and hence the third, that for interaction, may 
be obtained by 
Between groups — mode of presentation — time of testing = interaction 

7,/88475 | — 1,071.225 — 6,275.025 = 442.225 


The sum of squares for interaction may also be obtained 
directly as a check upon the accuracy of our arithmetic. This is 
accomplished by setting up a 2 X 2 table for the 2 variables, time 
of testing and mode of presentation, as shown in Table 34. Then 
the interaction sum of squares may be obtained by entering the 
sums corresponding to the cells of the table in the formula below. 


[(a +d) — (b + e) 
kn 


The value of k in formula (62) is the sum of the squares of the 


Interaction 


(62) 


289) | 


FACTORIAL DESIGN FOR EXPERIMENTS 215 


TABLE 34. Schematic Representation of the 2 X 2 Table for Computing the 
Interaction Sum of Squares with 1 Degree of Freedom 


Se eee 


Time of Mode of Presentation 

Testing Visual Auditory 
Immediate a b 
Delayed e d 


a a aaa 


coefficients of the sums compared. Thus, we have a = 601; 

= —564; c = —417; d = 247. The coefficients of these sums 
will be +1, —1, —1, and +1, respectively. Squaring and sum- 
ming these coefficients gives us k = 4. The value of m is the 
number of observations upon which each of the sums is based. 
Formula (62) will be applicable for the interaction sum of squares 
of any 2 variables, each of which is varied in 2 ways. 

It should be obvious that, regardless of how the 2 X 2 table 
is set up, as long as one variable is represented by the X axis and 
the other variable by the Y axis, the sums which will be compared 
by formula (62) are those for “visual-delayed plus auditory- 
immediate" and “visual-immediate plus auditory-delayed." Sub- 
stituting in the formula, we obtain for the interaction sum of 


Squares 
[(G01 + 247) — (564 + 417) _ (848 — 981)? 449 295 
(4) (10) 40 


6. ALLOCATION OF THE DEGREES OF FREEDOM 


We have now accomplished what we started out to do: The 
sum of squares between the 4 experimental conditions, based upon 
3 degrees of freedom, has been partitioned into 3 component parts, 
each part being associated with but a single degree of freedom. 
The degrees of freedom may be thought of as being allocated in the 
following ways: The sum of squares for mode of presentation, for 
example, is based upon the variation. of the mean for the visual 
presentation and the mean for the auditory presentation about the 
combined mean. The combined mean is thus used as an estimate 
of a parameter, and, for reasons discussed previously, 1 degree of 
freedom will be lost each time this occurs. Thus, we have only 1 


216 EXPERIMENTAL DESIGN [IN PSYCHOLOGICAL RESEARCH 


degree of freedom for the sum of squares based upon the variation 

cf these 2 means.! For the same reason, we will have 1 degree of 1 
freedom for the sum of squares for time of testing. The degrees ~~“ ^. 
of freedom for the interaction sum of squares will be the product 

of the degrees of freedom associated with the variables for which 

the interaction is being computed. Thus, in the present problem 

we have (1)(1) = 1 degree of freedom for the interaction sum of 
squares.? 


7. INTERPRETATION OF THE EXPERIMENT 


The summary of our analysis is given in Table 35. Each 
sum of squares in which we are interested has been divided by the 
corresponding number of degrees of freedom to arrive at a mean 


TABLE 35. Complete Analysis of Variance of the Retention Scores of dees) | 
4 Groups of Subjects 


Source of Sum of 
Variation Squares dj ‘Meanifauere 
Mode of presentation 1,071.225 1 1,071.225 11.13 
Time of testing 6,275.025 1 6,275.025 65.19 
Interaction: mode X time 442.225 1 442.225 4.59 
Within groups 3,465.500 36 96.264 
"Total 11,253.975 39 


square. The F ratios are obtained by dividing the mean squares 
indicated by the mean square based upon the variation of subjects — « 1 
treated alike, the mean square within groups, which is the appro- 
priate error term for testing the significance of the other mean 
squares. In general, it can be said that whenever replication is 
present within the experimental design, the within groups or mean 


1 In terms also of an earlier discussion, the restriction placed upon the 
data in this comparison is that Z Xı + E X» must equal Z X, or, in other 
words, 1,018 + 811 must equal 1,829. Only one of the values is free to vary; 
once it is given, the other may be obtained by subtraction from the total. 

? Just as the interaction sum of Squares could be found by subtraction, j 
so also may the degrees of freedom for this sum of squares be found by sub- ! 
traction. Thus: Between groups — mode of presentation — time of lesting — | 
interaction, and, substituting the degrees of freedom corresponding to these 
sums of squares, we have 3 —1—1 = 1. 


Be- 


FACTORIAL DESIGN FOR EXPERIMENTS 217 


square based upon replication is the appropriate error term against 
which to test the significance of all other mean squares.? 

All the values of F in Table 35 are based upon 1 and 36 degrees 
of freedom, and from the table of F we find that a value of 4.11 will 
be significant at the 5 per cent point and a value of 7.39 at the 1 
per cent point. The /"s for mode of presentation and time of 
testing thus have probabilities of less than .01, while that for 
interaction has a probability of less than .05. All 3 values of F 
may be regarded as significant. 

In view of the tests made, the significant value of F for mode 
of presentation demonstrates quite conclusively that the null 
hypothesis of random sampling from a common population is not 
tenable and provides definite support for our inference that the 
visual mode of presentation is superior to the auditory, as far as 
retention is concerned, under the conditions of this experiment. 
Similarly, there is every reason to believe that the immediate test 
is superior to the delayed. On the average, retention is signifi- 
cantly greater at the time of the immediate test than at the time of 
the delayed test. 

The significant value of F for interaction will require a bit of 
g. As pointed out earlier, a significant interaction 
indicates a differential in response to the auditory and visual 
modes of presentation at the time of the immediate and delayed 
testings. This is clearly shown in Table 36. The difference 
between retention on the visual-immediate and visual-delayed 


explainin| 


TABLE 36. Sums of Retention Scores for Mode of Presentation and Time 


of Testing 
Time of Mode of Presentation Difference 
Testing Visual Auditory 
Immediate 601 564 37 
Delayed 417 247 170 
184 317 —133 


Difference 


le would be when the categories or 


exception to this rul 
r more of the variables may be regarded as a random 


3A possible 
ation being sampled. This point is discussed in greater 


classifications of one o! 
selection from the popul 
detail on pp. 247-252. 


2]8  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


tests is equal to 601 — 417 = 184. The difference between reten- 
tion on the auditory-immediate and auditory-delayed tests is equal 
to 564 — 247 = 317. It is the failure of these 2 differences to be 
alike that accounts for interaction. Obviously, the discrepancy in 
retention between an immediate and delayed test is less when the 
material is presented visually than when the material is presented 
auditorily. 

It may also be observed that the difference in retention on the 
visual-immediate and the auditory-immediate tests is equal to 
601 — 564 — 37. "The difference between the visual-delayed and 
the auditory-delayed tests is equal to 417 — 247 — 170. It is 
apparent here also that the discrepancy in retention between 
auditory and visual presentations on the immediate test is less 
than it is on the delayed tests. 

To see clearly that the interaction sum of squares depends 
upon the difference between the marginal differences of Table 36, 
note first that the difference between the marginal differences of 
either the columns or the rows is —133. Working backwards 
from this difference, we have for the columns: —133 — (184 — 317) 
= (601 — 417) — (564 — 247). Substituting the letters from 
Table 34 for the sums of this last expression, we have (a — c) — 
(b — d). Similarly, we have for the rows: —133 — (37 — 170) — 
(601 — 564) — (417 — 247) = (a — b) — (c — d). We thus have: 
(a — c) — (b — d) = (a — b) — (c — d) = (a + d) — (b+ c). It 
is in the form of this last expression that we have put the term 
within the brackets of formula (62) for the interaction sum of 
squares. The greater the discrepancy between the values of 
(a — b) and (c — d), or any of the other equivalent expressions, the 
greater will be the interaction sum of squares. 

Apparently then, both of the following statements are true: 
The difference in retention of visually presented materials on an 
immediate test and on a delayed test is much less than the differ- 
ence in retention of material presented auditorily; also the differ- 
ence between the retention of visually presented and auditorily 
presented material is much less on an immediate test than it is on 
a delayed test. In the light of the test of significance, we say that 
there is an interaction between the 2 variables, time of testing and 


FACTORIAL DESIGN FOR EXPERIMENTS 219 


mode of presentation, which cannot be accounted for in terms of 
the limits of random sampling. 


8. THE 2X 2 X 2 FACTORIAL DESIGN 


The partitioning of the sum of squares between groups into 
the 3 components and the tests of significance of the mean squares 
based upon these components completes the analysis of variance 
of the2 x 2factorial design. Let us now add still a third variable, 
varied in 2 ways, to the experiment, so that we shall have (2) (2) (2) 
= 8 experimental conditions to investigate. The third variable 
will be the number of times the material is presented, and this will 
be varied by giving 1 and 2 presentations. Let us assume that 
we have 80 subjects and that they are divided at random into 8 
groups of 10 subjects each. One group of subjects will be tested 
under each of the 8 experimental conditions. We shall further 
assume that the data previously analyzed (the data of Table 31) 
are the outcomes for the single presentation, and to these data 
we shall add the outcomes for the additional 4 experimental 
conditions involving 2 presentations of the material. The design 


and the data are given in Table 37. 


TABLE 37. Outcomes of an Experiment on Retention for 8 Groups of Subjects 
Tested under Different Experimental Conditions 


One Presentation Two Presentations 
Visual Auditory Visual Auditory 


ee 

Imme- De Imme- De Imme- De Imme De- 

diate layed diate layed diate layed diate layed 
76 36 43 37 94 74 67 67 
66 45 75 22 85 74 64 60 


43 4T 66 22 80 64 70 54 
23 46 25 81 86 65 51 


F 43 56 11 80 68 60 49 
43 43 62 27 80 72 55 38 
42 54 51 23 69 62 57 55 
60 45 63 24 80 64 66 56 
78 41 52 25 63 78 79 68 
D 4 & A B12 2 58 
oni adii pe Pda meal 770 703 663 556 


Sum 


220 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


'The total sum of squares, the sum of squares between the 8 
experimental groups, and the sum of squares within groups may 
be found in the usual way. 


^ 9 swag GP 
Total = (76)?-+ (66)?+ (43)?+- - -+ (58)?— E wn 25,886.0 
(601 417)? |... , 6502 (452D)? _ ig cag 
Between 10 + 10 xis F^ 10 80 9,507. 
Within — total — between — 25,886.0 — 19,507.9 = 6,378.1 


As a check upon the arithmetic, we may calculate the sum of 
squares within each of the groups. Then the sum of these sums of 
squares should equal the sum of squares within groups. For the 
first group, we have 

2 2 _ (601)? 
E z1? = (76)? + (66)? + (43)? + -..-- (66)? — OW” 


= 1,582.9 
10 - 


The sums of squares for the other groups may be found in a similar 
fashion. Then, adding these, we will have 


1,582.9 + 590.1 + 890.4 + 402.1 + 1,026.0 
+ 576.1 + 624.1 + 686.4 = 6,378.1 


which is equal to the value obtained by subtraction. 

Each of the sums of squares within each of the various experi- 
mental conditions, when divided by the number of degrees of 
freedom, in this case 9, will give a value of s2. Under the hypothesis 
of random sampling from a common population these values will 
all be estimates of the same parameter. Making the necessary 
divisions, we have as the values of 8’, reading from left to right in 
Table 37, 175.9, 65.6, 98.9, 44.7, 114.0, 64.0, 69.3, and 76.3. It 
may seem that these values vary quite a bit to be estimates of a 
common parameter and the experimenter might wish to test this 
hypothesis before proceeding with the analysis of variance. 

The manner of testing for homogeneity of variance has al- 
ready been described, and the test will not be repeated here, with 
all of the calculations. Tt will suffice to say that the uncorrected 
value of x? which is obtained is equal to 5.91. Since we have 8 


«ud 


FACTORIAL DESIGN FOR EXPERIMENTS 221 


groups, the number of degrees of freedom available for evaluating 
the x? will be 7. From the table of x? we find that a value of x 
equal to 5.91 for 7 degrees of freedom has a probability of about .50, 
and there is no need to calculate the corrected value of xl. The 
data offer no evidence against the hypothesis of random sampling 
from a population or populations with a common variance. 

The analysis of variance up to this point has resulted in a 
breakdown of the total sum of squares and degrees of freedom into 
2 parts. One part is associated with the differences between the 8 
experimental conditions or groups of subjects treated differently, 
and is based upon 7 degrees of freedom. The other part is asso- 
ciated with the variation of subjects treated alike and is based upon 
72 degrees of freedom. This analysis is shown in Table 38. The 
test of significance will be given by F = 2,786.843/88.585 = 31.46. 


8. Two-Part Analysis of Variance of Retention Scores for 8 Groups 


TABLE 9 
of Subjects T'ested under Different Experimental Conditions 
Source of Sum of df Mean F 
Variation Squares Square 
Between groups 19,507.9 7 2,786,843 31.46 
6,378.1 7 88.585 


Within groups 
25,886.0 79 


Total 


we find that for 7 and 72 degrees of freedom, 


From the table of F 
highly significant with a probability much less 


the value of 31.46 is 


than .01. 

Since the hypothesis tested by F is that of random sampling 
from a common normal population, and since the supplementary 
test of homogencity of variance offered no evidence of significant 
differences in the variances, the significant value of F may be 
attributed to the differences in means. We now wish to localize 
the source of these differences more precisely and to draw inferences 
concerning the influence upon retention of the 3 experimental 
variables being investigated, mode of presentation, number of 

sting, and the interactions between 


presentations, and time of te j 1 
these variables. The design of this experiment is such that it 
readily lends itself to à further partitioning of the sum of squares 


222 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


between the 8 experimental conditions. In the 2 x2 x2 
factorial design, the sum of squares and the 7 degrees of freedom 
between the 8 experimental conditions may be further analyzed 
into 7 component parts, each part based upon a single degree of 
freedom. 


9. SUMS OF SQUARES FOR THE MAIN EXPERIMENTAL VARIABLES 


The variable, number of presentations, varies in 2 ways, and 
so we shall have 1 degree of freedom for this sum of squares (1 less 
than the number of sums or means which are to be compared). 
The 2 sums for this variable will be the sum of scores for the 4 
groups with but a single presentation and the sum of scores for the 
4 groups with two presentations. These 2 sums are 601 + 417 + 
564 + 247 = 1,829 for the single presentation, and 770 + 703 + 
663 + 556 = 2,692 for the double presentation. Each of these 2 
sums will be based upon 10 + 10 + 10 + 10 = 40 observations. 
Then the sum of squares for number of presentations will be given 
by 


(1,829)? " (2,092)? E (4,521)? 


40 40 80 280.5 


It may also be observed that formula (61) will apply to this case 
and that we could also obtain 


(1,829 — 2,692)? (—863)? 
(2) (40) |. 80 


= 9,309.6 


We also have the variable, mode of presentation, which 
varies in 2 ways, visual and auditory, and so we shall have 1 degree 
of freedom here. The 2 sums which are to be compared are the 
sum of scores for the 4 groups given the visual presentation, and 
the sum of scores for the 4 groups given the auditory presentation. 
These 2 sums are 601 + 417 + 770 + 703 = 2,491 for the visual 
mode of presentation and 564 + 247 + 663 + 556 = 2,030 for the 
auditory mode of presentation. Again, each of these 2 sums will 
be based upon 10 + 10 + 10 + 10 = 40 observations. By means 
of formula (61), we obtain as the sum of squares for mode of 


FACTORIAL DESIGN FOR EXPERIMENTS 223 


presentation 
(2,491 — 2,030)? (461)? 
= = 2,656.5 
(2) (40) 80 j656.9 


We also have a comparison between the immediate and 
delayed tests, and this will be based upon 1 degree of freedom. 
The 2 sums for this comparison will be based upon the sum of 
scores for the 4 groups with the immediate test and the sum of 
scores for the 4 groups with the delayed test. These 2 sums are 
601 -+ 564 + 770 + 663 = 2,598 for the immediate test, and 
417 + 247 + 703 + 556 = 1,923 for the delayed test. Each of 
these sums will also be based upon 40 observations. Then, by 
means of formula (61), we obtain as the sum of squares for time of 
testing 

(2,98 — 1,923)? _ (675)? 
TU = ap = 5,095.3 


We have accounted for 3 of the 7 degrees of freedom. Three 
of the remaining 4 degrees of freedom will be associated with the 
various simple interactions of the variables. We will thus have an 
interaction between number of presentations and mode of presenta- 
tion with 1 degree of freedom (given by the product of the degrees 
of freedom of the interacting variables). The interaction between 
mode of presentation and time of testing will account for another 
degree of freedom. A third degree of freedom will be assigned to 
the interaction between number of presentations and time of 
testing. These interactions are all similar to those calculated 
earlier and are often referred to as first order or simple interactions, 
since they are based upon the interaction of but 2 variables. 


10. THE INTERACTION SUMS OF SQUARES 


The sums of squares for the 3 simple interactions may be 
2 tables corresponding to Table 34, de- 


obtained from the 2x bles 
scribed earlier, and by substituting the sums corresponding to the 


cell entries of the 2 X 2 table in formula (62). 
able 39 will give the sum of squares for the 


The sums from T t 
interaction between number of presentations and mode of presenta- 


224 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


TABLE 39. Sums of Retention Scores for Number of Presentations 
and Mode of Presentation 


——M—————————————?9 


Number of 
" noe a Presentations 
esentation aa pee 
resenta One Two 
Visual 1,018 1,478 
Auditory Su 1,219 


o I Immm 

tion. By formula (62), we thus obtain 

[(1,018 + 1,219) — (1,473 + 811)? (2,237 — 2,284)? = 976 
(4) (20) 80 nid 


The sums from Table 40 will give the sum of squares for the 
interaction between time of testing and number of presentations. 


TABLE 40. Sums of Retention Scores for Time of Testing and 


Number of Presentations 
ns 


Number of Time of Testing 
Presentations Immediate Delayed 
One 1,165 664 
Two 1,433 1,259 


OO 

Substituting the sums from the table in formula (62), we obtain 

[(1,165 + 1,259) — (664 + 1,33)? (2,424 — 2,097)? 
(4) (20) 80 


= 1,336.6 


From the sums in Table 41 we may obtain the sum of squares 
for the interaction between time of testing and mode of presenta- 


TABLE 41. Sums of Retention Scores for Time of Testing and 
Mode of Presentation 


Mode of Time of Testing 
Presentation Immediate Delayed 
Visual 1,371 1,120 
Auditory 1,227 803 


M n0En See 


y^ 


FACTORIAL DESIGN FOR EXPERIMENTS 225 


tion. Thus, substituting the sums from this table in formula (62), 

we have 

((,371 + 803) — (1,120 + 1,227)? _ QA74 — 2,347)? = 374.1 
(4) (20) 80 Bug 


The sums of squares for the simple interactions computed 
above have accounted for 3 of the 4 degrees of freedom that were 
left after we had found the sums of squares for the 3 experimental 
variables. The single remaining degree of freedom will be asso- 
ciated with the second-order or triple interaction, based upon the 
interaction of the 3 variables. The number of degrees of freedom 
for any higher-order interaction may be determined in the same 
way that we determine degrees of freedom for simple interactions. 
The degrees of freedom will be equal to the product of the degrees 
of freedom associated with the interacting variables. Since we 
have but 1 degree of freedom for each of the 3 variables, we will 
have but 1 degree of freedom for the second-order interaction. 

The sum of squares for the triple interaction may be calcu- 
lated directly, but is obtained most easily by subtraction. The 
sum of squares 19,507.9 between the 8 experimental conditions will 
be equal to the sum of the sums of squares for mode of presentation, 
number of presentations, time of testing, the 3 simple interactions, 
and the second-order interaction. Hence, if we calculate all the 
sums of squares except that for the second-order interaction, this 
sum of squares will be given by subtraction of the other 6 values 
from the sum of squares between the 8 experimental groups. 

The sum of the 6 sums of squares calculated so far is equal to 
19,399.7, and by subtraction from the sum of squares between 


groups, we obtain 
19,507.9 — 19,399.7 = 108.2 


for the sum of squares for the second-order interaction. 


11. THE INTERPRETATION OF THE EXPERIMENT 


The summary of our analysis is presented in Table 42, where 
we have divided the sums of squares by the number of degrees of 


í See Chap. 13, pp. 242-246. 


226 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


freedom to obtain the mean squares. The values of F which have 
been entered have been obtained by dividing each of the mean 
squares which is to be tested for significance by the mean square 
based upon the variation within groups treated alike. Thus, 


TABLE 42. Complete Analysis of Variance of Retention Scores for 8 Groups 
of Subjects Tested under Different Conditions 


Source of Variation ie of df Mean F 
quares Square 

Between number of presentations 9,309.6 1 9,309.6 105.07 

Between modes of presentation 2,656.5 1 2,656.5 29.98 

Between times of testing 5,695.3 1 5,695.3 64.28 

Interaction: number X mode 27.6 1 27.6 

Interaction: number X time 1,336.6 1 1,336.6 15.09 

Interaction: mode X time 374.1 1 374.1 4.22 

Interaction: number X mode X time 108.2 1 108.2 1.22 

Within groups 6,378.1 72 88.6 

Total 25,886.0 79 


P ee 


each value of F in the table will be based upon 1 and 72 degrees of 
freedom. No value of F was calculated for the interaction between 
number of presentations and mode of presentation, as this mean 
square is obviously not significantly larger than the mean square 
within groups. 

By interpolation in the table of F we find that for 1 and 72 
degrees of freedom, a value which is equal to 3.976 will be signifi- 
cant at the 5 per cent point, and a value of 7.00 will be significant 
at the 1 per cent point. The interpretation of the significant values 
of F for number of presentation, mode of presentation, and time of 
testing is clear and straightforward. An examination of the 
original data shows that the visual mode is superior to the auditory, 
and the test of significance indicates that the difference is such that 
it cannot be accounted for by random sampling from. a common 
population. The 2 means, we would say, differ significantly. 
Similarly, the immediate test is superior to the delayed, and the 
double presentation is superior to the single. All these compari- 
sons meet the requirements of statistical significance. 

The lack of significance of the interaction between number of 


~$ 


FACTORIAL DESIGN FOR EXPERIMENTS 227 


presentations and mode of presentation is readily understood from 
an examination of the sums upon which this interaction is based 
(Table 39). That interaction between these 2 variables is almost 
completely absent may be noted in the manner described earlier. 
Compare, for example, the difference 1,018 — 811 = 207 and the 
difference 1,473 — 1,219 = 254. Compare also the difference 
1,018 — 1,473 = — 455 and the difference 811 — 1,219 = — 408. 
It is the similarity between these differences that accounts for the 
absence of interaction. 

Now examine the sums in Table 40, upon which the highly 
significant interaction between number of presentations and time 
of testing is based. Note that the difference in retention for 1 
and 2 presentations on the immediate test is not nearly so great as 
the difference which is present on the delayed test. These 2 dif- 
ferences are 1,165 — 1,433 = —268 and 664 — 1,259 = —595, 
respectively. Similarly, note that with 1 presentation, the differ- 
ence in retention between the immediate and delayed tests is much 
greater than the difference in retention between the immediate and 
delayed tests with 2 presentations. These differences are 1,165 — 
644 = 501 and 1,433 — 1,259 = 174, respectively. It is the 
failure of these differences to be alike which results in a significant 
interaction mean square. 

We have observed that the interaction between number of 
s and time of testing is significant. What does this 
the experimenter’s point of view? The clue to the 
s provided by the differences noted above. We 
could say that the difference observed in retention for material 
presented once and material presented twice depends upon whether 
we test for retention immediately or after an interval oftime. The 
difference observed on an immediate test is much less than the 
difference observed on the delayed test. Similarly, we could say 
that the difference observed in retention between an immediate 
and a delayed test depends upon whether the material has been 
presented once or twice. With a single presentation the difference 
is much greater than it is with 2 presentations. These two alterna- 
tives are, of course, logically equivalent. 

Triple interactions, such as the one we have here, may be 
interpreted in much the same manner in which we have approached 


presentation 
mean from 
interpretation i 


228 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


the simple interactions. If significant, this interaction would 
indicate that the difference observed between the number of presen- 
tations depends upon the mode of presentation and the time of 
testing; that the difference observed between the modes of presen- 
tation depends upon the time of testing and the number of presen- 
tations; or that the difference observed between the times of testing 
depends upon the mode of presentation and the number of presen- 
tations. These three statements are also logically equivalent. 

Interpretations similar to those we have made for simple and 
second-order interactions may be made for third- and fourth-order 
interactions and for interactions of still higher order. 


12. TESTING WHETHER ONE MEAN SQUARE IS SIGNIFICANTLY 
SMALLER THAN ANOTHER 


One additional point may be made with respect to the experi- 
ment we have analyzed. 'The mean square for the interaction 
between number of presentations and mode of presentation was 
27.6, and since this was smaller than the mean square within 
groups 88.6, no value of F was computed. Although the test is 
not pertinent to the experiment, it could be determined whether P 
is significantly small? or, in other words, whether the mean square 
of 27.6 is significantly smaller than the mean square 88.6. 

Compute F in the usual manner with the mean square within 
groups in the denominator and then find the reciprocal of this 
value. (This will result in the same value as if the larger mean 
square were divided by the smaller mean square.) Now enter 
the F table with the degrees of freedom for the numerator and 
denominator interchanged, i.e., in the present example, the column 
entry would be taken as 72 degrees of freedom and the row entry 
as 1 degree of freedom. According to the table of F, for 72 and 1 
degrees of freedom, a value of approximately 253.0 would be 
required for significance with a P equal to.05. The F computed in 
the manner described is 3.21 and is obviously not significant. 

As pointed out previously, the situations where the F obtained 
in this manner is significant probably have no reasonable inter- 
pretation other than that they are the occasionally significant 
values which are to be expected in random sampling. 


5 Hoel (1947), Snedecor (1946)). 


S 


VC 


—— — Y 


FACTORIAL DESIGN FOR EXPERIMENTS 229 


13. A METHOD OF SHOWING THE COMPARISONS IN THE 2” 
FACTORIAL DESIGN 


A method has been described by Yates (1937) and by Brandt 
(1941) which does much to clarify the nature of the comparisons 
involved in the 2" factorial experiment. In the 2 x2 x2 
factorial experiment described, for example, the source of the sum 
of squares within groups and the sum of squares between the 
8 experimental conditions is fairly obvious. But perhaps not quite 
so obvious is the analysis of the sum of squares between the 8 
experimental conditions into the 7 component parts, each asso- 
ciated with a single degree of freedom. The method to be de- 
scribed shows the sources of the sums to be compared in the 
2 x 2 x 2 factorial design and may be used with more complex 
experiments as well.9 

Let us take the sums upon which the sum of squares between 
the 8 experimental conditions is based and enter these sums at the 
top of Table 43. Each of these sums is based upon 10 observa- 
tions, and this figure is entered under the n column of the table. 
In the first row of the table, we have indicated the manner in 
which the various sums will be treated in obtaining a sum of 
squares based upon the variation between "number of presenta- 
tions.” Under every sum which involves a single presentation, a 
plus sign has been entered, and for each of the sums with a double 
presentation, a minus sign has been entered. These plus and 
minus signs stand for the coefficients of plus 1 and minus 1 which 
are to be associated with the sums at the top of the table. The 
coefficient indicates the number of times and the manner in which 
the sum at the top of the column enters into the comparison. In 
the 2 x 2 X 2 factorial experiment all the coefficients will be 
equal to unity so that the numbers are not written in the body of 
the table. . 

The algebraic sum of the column headings, after multiplica- 
tion by the coefficients in the first row, is entered in column 9. 


'Thus, —863 was obtained by taking 
(601 + 417 + 564 + 247) — (770 + 703 + 663 + 556) = —863 


6 See Brandt (1941) for an illustration. 


TABLE 43. The Calculation of the Sums of Squares with 1 Degree of Freedom in the 2" Factorial Design 
SINGLE DouBLE 
PRESENTATION PRESENTATION 
EMEN 
Vis. Aud. Vis. Aud. 
REP 
0 Qo G H 6) ©) (0) (9 (9) (10) (11) (12) 
(Sum)? 
Imm. Del Imm. Del Imm. Del. Imm. Del Sum k n E x 
601 417 564 247 770 703 663 556 
1. Number of presentations + + + =x - - - —808 8 10 9,309.6 
2. Mode of presentation + + = > + + - - 461 8 10 2,656.5 
3. Time of testing + - + = + - + - 675 8 10 5,695.3 
4. Number X mode T + = = — - + + — 47 8 10 27.6 
5. Number X time + - + = = + - + 327 8 10 1,336.6 
6. Mode X time + er = + H - ad + —173 8 10 874.1 
7. Number X mode X time + - -= + = + + - —93 8 10 108.1 
Sum 7 =- - - = = = - 287 56 19,507.8 


H 
i'd 


086 


II NDISHG IVINSWINSdXGd 


HOUVASAA 'IVOIDOTOHOASd 


FACTORIAL.DESIGN FOR EXPERIMENTS 231 


The value of k entered in row 1 and column 10 is obtained by 
squaring the coefficients in the row and summing the squares. 


We thus have 
(1)? + (1)? + Q* + (P + (=)? + (-1)? 
+ (-1)? + (-1)? =8 
In exactly the same way that we entered the values in row 1 
for the comparison of the variable, number of presentations, the 
values have been entered in row 2 and row 3 for the comparisons 
of the variables “mode of presentation" and “time of testing." 
The coefficients entered in the rows for the simple interactions 
are obtained by multiplying the corresponding coefficients in the 
two rows for the interacting variables. Thus, for the coefficients in 
row 4, we multiply the coefficients for number of presentations, row 
1, by the corresponding coefficients for mode of presentation, row 2. 
The coefficients for the second-order interaction, row 7, may 
be obtained by multiplying the coefficients for any one of the 
simple interactions by the corresponding coefficients for the single 
variable which is not involved in the simple interaction. For 
example, we could multiply the coefficients of row 4 by those of 
row 3; or we could multiply the coefficients of row 5 by those 
d multiply the coefficients of row 6 by those of 


of row 2; or we coul 
row 1. Any one of these multiplications would give the coefficients 


for the second-order interaction. 

Certain checks are provided for the various operations. 
The algebraic sums of the coefficients for each column are entered 
in the last row of the table. The sum of the squares of these 
figures should equal the sum of column 10, i.e., 2; k which, in the 
present problem, is 56. When the sums entered at the top of the 
table are multiplied by the corresponding entries in the last row 
of the table, they should have an algebraic sum equal to the 
algebraic sum of column 9, which, in the present instance, is 287. 
The sum of column 12, the sum of the sums of squares, should 
equal the sum of squares between the 8 experimental conditions, 
previously found to be 19,507.9, and does within errors of rounding. 

Although computations by means of Table 43 may not be as 
hey are by the usual methods of the analysis of 


convenient as t z 
variance, a study of the table should be of assistance in gaining a 


232 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


clear understanding of the sums which are compared in the 
analysis. 


14. ORTHOGONAL COMPARISONS 


Table 43 also indicates the nature of what has been described 
as a set of orthogonal comparisons." 'The term orthogonal refers 
to the condition where a set of comparisons are all independent. 
The test for orthogonality of the set of comparisons in Table 43 is 
easily made. If the comparisons are independent, then the follow- 
ing conditions will hold: (1) the sum of the coefficients in each row 
of the table must equal zero; (2) the sum of the products of the 
corresponding pairs of coefficients in any pair of rows must be equal 
to zero. It can readily be determined that the comparisons of 
Table 43 satisfy these conditions and are, therefore, orthogonal. 


15. SOME ADVANTAGES OF THE FACTORIAL DESIGN 


The factorial design has a number of merits which may now 
be pointed out. It may be noted that the full number of subjects, 
i.e., 80, entered into every comparison made, despite the fact that 
the experimental groups each contained but 10 subjects. This is 
because, to take but one comparison, that between the number 
of presentations, the 80 subjects could be split into a group of 40 
who were alike in having a single presentation, and another group 
of 40 alike in that they had two presentations. The 40 subjects 
within each of the 2 groups were not all alike in all other respects, 
but they were alike in groups of 10. Corresponding to the group 
of 10 subjects with a single presentation who were given a visual 
presentation and an immediate test, there was a corresponding 
group of 10 subjects with a double presentation who were also 
given a visual presentation and an immediate test. The difference 
between the retention scores of these 2 groups of 10 subjects, if any 
difference exists other than that which can be attributed to random 
sampling, may be ascribed to the factor in which they differ, the 
number of presentations. Similarly, for each of the other 3 sets 
of 10 subjects in the group given a single presentation, there will be 
a corresponding set of 10 subjects in the group of 40 subjects given 
a double presentation. For each of these corresponding sets of 


7 Yates (19330), Fisher (1942). 


b nr 


J 


FACTORIAL DESIGN FOR EXPERIMENTS 233 


10 observations the experimental conditions are constant except 
for the one factor, number of presentations. Thus, in comparing 
the 2 sets of 40 observations, we are testing for the influence of the 
one condition which is different for each set of 40, namely, the 
number of presentations. 

It should also be observed that the sum of squares which is 
used as an estimate of uncontrolled variation of subjects treated 
alike, the within groups sum of squares, is based upon 72 degrees 
The corresponding sum of squares in an experiment 
e variable and but 2 groups, to be based upon 
er of degrees of freedom, would require 37 
subjects in each group. And this experiment would provide 
information only about the single variable. In the factorial 
experiment, on the other hand, we not only have information about 
the several variables, but also about the interactions between these 
variables. This in turn means that the outcomes of the experiment 
provide a sounder basis for generalizing about the effectiveness of 
the experimental variables, since they are tested not only in isola- 
tion, but in conjunction with the effects of other variables. 


of freedom. 
involving but a singl 
a comparable numb 


16. EXAMPLES 

1. Suppose that we have an experiment in which 4 variables 
A, B, C, and D are each varied in 2 ways. Suppose, further, that 
be of factorial design and that 5 subjects are 
experimental condition. (a) Using the 
subseripts 1 and 2 to represent the ways in which each variable is 
to be varied, write the symbols corresponding to the various ex- 
perimental conditions. For example, one condition will be given 
by A;BiCiDi- (b) Set up the summary table showing the sources 
of variation, including all interactions, and the number of degrees 
of freedom associated with each source. 

2. The following data have been modified from an experiment 
by Glanville, Kreezer, and Dallenbach (1946). The problem was 
an investigation of the accuracy of apprehension of written words 
under various experimental conditions. Three variables were 
selected for study: time of exposure, type size, and background. 
Type size was varied in 2 ways. The words were printed in 6-pt. 
and 12-pt. type. For each of these type sizes the background was 
varied in 2 ways with a blank and with a printed background. 


the experiment is to 
to be assigned to each 


234  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


For each of these type-background combinations, exposure time 
was varied in 2 ways with a 60 m. sec. and a 120 m. sec. interval. 
Unfortunately, for our purposes, à list of only 100 words was used 
in the test conditions, and for the longer exposure times the means 
were all close to the upper limit with the associated result of very 
small variances for the longer exposure times. Not only is the 
variance heterogeneous, but, because the means for the longer 
exposure times approach the upper limit, the distributions for these 
conditions were probably markedly skewed. We shall do no 
violence to the conclusions arrived at by the experimenters if we 
assume slightly different experimental conditions than those 
actually used. Let us assume that subjects were assigned at 
random to the 8 experimental conditions and that 50 subjects were 
used in each test situation. Let us further assume that the lists 
contained more than 100 words and that the conditions of normality 
of distribution and homogeneity of variance are satisfied. With 
these assumptions, we have the following sums of scores (unchanged 
from the original experiment). 


Experimental Condition Sum of 

Type Background Exposure Time Scores 
6pt. Blank 60 m. sec. 1,319 
6pt. Blank 120 m. sec. 4,592 
6pt. Printed 60 m. sec. 1,196 
6pt. Printed 120 m. sec. 4,365 
12pt. Blank 60 m. sec. 3,682 
12pt. Blank 120 m. sec. 4,939 
12pt. Printed 60 m. sec. 3,357 
12pt. Printed 120 m. sec. 4,885 


The sum of squares within groups is given as equal to 84,397; 
the total sum of squares is given as equal to 405,084; and, by 
calculation, you will find the sum of squares between the 8 experi- 
mental conditions equal to 320,687. Complete the analysis of 
variance and test for significance of the experimental variables and 
the various interactions. 

3. Here is a simple set of measures for practice in computa- 
tion of the necessary sums of squares in a factorial design. Sup- 

ose that we have 2 variables each varied in 2 ways. We may 
indicate these variations in the manner of Example 1. The data 


FACTORIAL DESIGN FOR EXPERIMENTS 235 


are as follows: 
A B 


m 
iw] 
m 
t2 


pey 


(0-100000 
"^ 
evo 10-000 
Aw 00»coo 
[^ NES ESO 


(a) Find the necessary sums of squares and the various 
values of F. (b) If A and B represent 2 different colored back- 
grounds and 1 and 2 represent 2 different type sizes, and, let us 
assume, also, the data represent measures of legibility, how would 
you interpret the results of the analysis? 

4. Suppose that the scope of the experiment in Example 3 is 
extended to include still a third variable and that this is also 
varied in 2 ways. This variable might be the degree of illumina- 
tion and may be represented by IandII. Let us assume that we 


now have the following results: 


A B 

1 2 1 2 

8 5 10 5 

6 8 9 7 

9 10 4 3 

I 9 7 8 5 
8 10 8 3 

d 7 4 5 

6 8 3 5 

3 5 6 8 

7 6 5 2 

10 8 7 7 

6 7 4 5 

IH 7 6 7 7 
5 8 6 5 

7 9 8 9 

6 8 10 6 

10 9 6 6 


236 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Find the necessary sums of squares and the various values of 
F and interpret your results. 

5. Taylor (1942) gives the following data for an analysis of 
variance: 


Test I Test II 
21 24 
19 15 
25 13 
24 24 
College A 20 30 
25 18 
30 25 
30 16 
18 25 
18 20 
22 24 
24 32 
26 18 
23 12 
College B 37 20 
32 15 
40 30 
35 21 
30 15 
31 13 


Analyze the total sum of squares into the various components 
and test the various mean squares for significance. 

6. Take the sums for the 8 groups of Example 4 and set up a 
table with plus and minus signs showing how these sums are com- 
bined to give the 7 sums of squares for the main effects and interac- 
tions. The method of doing this is described in the chapter. 


Pip 


| 
| 


CHAPTER 13 


Complex Factorial Designs ; 


1. INTRODUCTION 


The factorial design is not limited to experiments in which 
the variables are varied in but 2 ways, as in the examples cited 
in the last chapter. The factorial design and the analysis of 
variance may also be applied to problems in which one or more 
of the variables may be varied in several ways. It is clear, how- 
ever, that if a variable which is varied in 3 or more ways is intro- 
duced into the problem, this variable will have more than a single 
degree of freedom. Then it also follows that the interactions 
between this variable and other variables in the experiment will 
be based upon more than a single degree of freedom. The degrees 
of freedom for interaction sums of squares, it may be recalled, are 
the product of the degrees of freedom associated with the inter- 
acting variables. 

Since the method of calculating the sums of squares for 
simple interactions, when these are based upon more than 1 degree 
of freedom, differs from the methods previously described, we 
shall examine a factorial design which involves interactions with 
more than a single degree of freedom. We shall also show a 
method for the direct calculation of a second- or higher-order 
interaction. The methods to be described will be applicable to 
the calculation of any simple interaction or any second- or higher- 
order interaction, regardless of the degrees of freedom upon which 
they are based. It should be possible, then, for the reader to 
generalize from the examples cited to any particular factorial 


design in which he happens to be interested. 


2.A4X%3X2 FACTORIAL DESIGN 


Let us take as an example an experiment in which 3 variables 
A, B, and C. Suppose that A is varied in 
237 


are involved, namely, 


238 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


4 ways, B is varied in 3 ways, and C is varied in 2 ways. Then 
we shall have (4)(3)(2) = 24 combinations of variables, each 
combination corresponding to a particular experimental condition. 
One replication of the experiment will thus require 24 subjects, and 
the 23 degrees of freedom available with 1 replication would be 
allocated to the following sums of squares: 


Main variables: A 
B 
C 


Simple interactions: A X B 
AxC 
BxC 


Triple interaction: AxBxC 6 


NWO eN 


If we had 120 subjects, we could assign at random 5 subjects 
to each of the 24 experimental conditions. We would thus have 
4 degrees of freedom within each of the 24 experimental conditions 
or (4)(24) = 96 degrees of freedom for the variation of subjects 
treated alike. The sum of squares for the 96 degrees-of freedom 
would be the sum of squares within groups which would be used 
to derive the mean square for testing the significance of the main 
experimental variables, the simple interactions, and the triple 
interaction. 


3. CALCULATION OF THE SUM OF SQUARES 


'The method of caleulating the sum of squares within groups 
would be exactly the same as in examples previously cited. We 
would calculate the total sum of squares and the sum of squares 
between the 24 experimental conditions and obtain the sum of 
squares within groups by subtraction. Or we could calculate 
the sum of squares within each of 24 groups separately, and the 
sum of these sums of squares would be equal to the sum of squares 
within groups. Let us suppose that the sum of squares within 
groups has already been calculated and has been found equal to 
1,198, and that we have added together the scores or measures 
of the 5 subjects in each of the experimental conditions to obtain 
the sums entered in Table 44. Each sum in the table is based 


COMPLEX FACTORIAL DESIGNS 239 


TABLE 44. Outcomes of a 4 X 3 X 2 Factorial Design 


A1 EC A3 A4 Sum 

Bi 60 9 94 86 330 

n Ci B» 54 92 98 96 340 
Bs " 70 76 80 60 286 

By 58 72 78 84 292 

Ca Ba 176 82 74 64 296 

Bs 66 56 72 78 272 

Sum 384 468 496 468 1,816 


upon an mof 5. The sum of squares between the 24 experimental 
conditions, for which we have the sums in the table, would be 
given by 
eo)? GD*, go? , ,, q BE _ (SI 
5 + 5 + 5 + + 5 120 = 783.47 


Tt is this sum of squares, 783.47, based upon 23 degrees of freedom, 
that is to be further analyzed into the component parts enumerated 
earlier. 

From the data of Table 44 we may set up a table for variable 
A and variable C, ignoring the B classification. Thus we obtain 
Table 45. The two sums 956 and 860 are the sums of scores for 


TABLE 45. Outcomes of the 4 X 3 X 2 Factorial Design with the B 
Classification Ignored 


Ai EU As En Sum 
SSS A CSS 
€i 184 258 272 242 956 
Cs 200 210 224 226 860 

Sum 384 468 496 468 1,816 


the C variable and by means of formula (61) we may obtain the 
sum of squares for C. Keeping in mind that the 2 sums are each 
based upon an n of 60 observations, we have 
- 2 96 2 
(956 — 860)? _ (96) _ 76.80 
(2) (60) 120 


240 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


'The sum of squares for the 4 variable will be based upon 
the sums 384, 468, 496, and 468 given at the bottom of Table 45. 
Each of these sums is based upon an n of 30 observations, and the 
sum of squares will be given by 


(384)* 4: (468 )* z (496)* à (468)? . (4816) 


30 30 30 30 ug c4 


The sum of squares for the interaction of A and C may also 
be obtained from Table 45. Calculate first the sum of squares 
between the 8 sums entered in the cells of the table, keeping in 
mind that each of these sums is based upon an n equal to 15. 
Thus, 


(84) , (200)? (258)? | — (226)? (1810 ^. 
i5 ' 35 ' js +i - 12 7 405.87 


Then the sum of squares for the interaction of A and C may be 
obtained by subtracting the sum of squares for A and the sum of 
squares for C, which we have already calculated, from the sum of 
squares between the 8 sums 405.87. Thus the interaction sum 
of squares for variables A and C will be given by 


405.87 — 235.20 — 76.8 = 93.87 


We may now go back to the data of Table 44 and set up 
another table showing the classification of A and B, ignoring the 
classification of C. In this manner we obtain the data of Table 46. 


TABLE 46. Outcomes of the 4 X 3 X 2 Factorial Design with the C 
Classification Ignored 


A1 Ag A3 Ay Sum 

By 118 162 172 170 622 
B» 130 174 172 160 636 
Bs 136 132 152 138 558 
Sum 384 468 496 468 1,816 


fem———— ái ÓÉEAESa 


The sum of squares for variable B may be obtained from the sums 
622, 636, and 558. Since each of these sums is based upon an n 


— 


COMPLEX FACTORIAL DESIGNS 241 
of 40 cases, we will have 
622)? (036)? , (558) (1,816)? 
(622) n ( " D 


40 40 40 o ~ 9e 


To obtain the sum of squares for the interaction of A and 
B, we first calculate the sum of squares between the 12 sums in` 
the cells of Table 46. Since each of these sums is based upon an 
n of 10 cases, we have 


(118)? , (307 , (136)? | — (138)? — (1,816)? 


= 495 
10 10 10 10 T NN d 


The sums of squares for variable A and for variable B have already 
been calculated, and by subtraction we obtain the sum of squares 
for the interaction of these 2 variables. Thus, 


425.87 — 235.20 — 86.47 = 104.20 


'To obtain the sum of squares for the interaction of variables 
B and C, we must set up still another table from the data of Table 
44. Ignoring the classification for variable A, we obtain the 
data which have been entered in Table 47. The sum of squares 


TABLE 47. Outcomes of the 4 X 3 X 2 Factorial Design with the A 
Classification Ignored 


Bı Bs Bs Sum 

CG 330 340 286 956 
Co 292 296 272 860 
Sum 622 636 558 1,816 


between the 6 sums entered in the cells of the table is found as 
before. Since each of these sums is based upon 20 cases, we have 


(330)  , (292) z @40y" SQL 272) (sep 
20 20 20 20 gg c Ae 


We have already calculated the sum of squares for variable B and 
the sum of squares for variable C. Thus, by subtracting these 
sums of squares from 175.87, we may obtain the sum of squares 


242 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


for the interaction between variables B and C. Thus, 
175.87 — 86.47 — 76.80 = 12.60 


The sum of squares for the triple interaction is obtained by 
subtracting the sums of squares for variables A, B, and C and the 
simple interactions, from the sum of squares based upon the varia- 
tion of the 24 experimental groups. Thus, 


783.47 — 235.20 — 86.47 — 76.80 — 104.20 — 93.87 
— 12.60 = 174.33 


4, DIRECT CALCULATION OF A SECOND- OR HIGHER-ORDER 
INTERACTION 


In some experiments it may be necessary to calculate directly 
the sum of squares for a second- or higher-order interaction. 
We may illustrate a method of obtaining the sum of squares for 
a second-order interaction with the problem at hand. The proce- 
dure will be exactly the same for any other second- or higher- 
order interaction, and the method is thus general in its application. 

The interaction to be calculated is that of A x B x C. 
Examine the data of Table 44. The cell entries there are the 
sums of scores for 5 subjects. These data may be rearranged in 
the manner of Table 48. We have first separated the groups 
according to the A variable and then in terms of the B and C 
variables. Consider only Sub Table I. From the data in this 
table, it would be possible to calculate a sum of squares between 
the 6 cells. A second sum of squares could be calculated between 
the 2 rows, and a third sum of squares between the 3 columns. 
The row sum of squares would be the sum of squares for the C 
variable under condition A;. The column sum of squares would 
be the sum of squares for the B variable under condition Aj. 
If we subtract these two sums of squares from the sum of squares 
between the 6 cells, we shall have an interaction sum of squares. 
This procedure involves nothing new. We have used this method 
of calculation before in obtaining an interaction sum of squares. 
The interaction obtained from Sub Table I, however, is the inter- 
action for B X C under condition Aj, and this may be symbolized 


by Ai(B X C). 


COMPLEX FACTORIAL DESIGNS 243 


TABLE 48. T'he Sub Tables for the Direct Calculation of the Second-Order 


Interaction 
Sup Tasie I Sus Taste II 
A EC 
Bi Be B3 Sum Bı B» B3 Sum 
(on 60 54 70 184 C1 90 92 76 258 
C, 58 76 0 20 C 72 $2 856 210 
Sum 118 130 136 38 Sum 162 174 132 468 
Sus Taste III Sus Taste IV 
As Ag 
Bi Be B3 Sum Bı Bo B3 Sum 
Cy 94 98 80 272 Cı 86 96 60 242 
€, (78 74 72 24 — C. 84 GL 78 226 
Sum 172 172 152 496 Sum 170 160 138 468 


Sus TABLE V 


A 2 Aic A2 t As Ay 


Bi B» B3 Sum 


Cy 330 340 286 956 
C2 292 296 272 860 


Sum 622 636 558 1,816 


The process described could be repeated for each of the other 
sub tables. We would thus have the interactions of A,(B x C), 
As(B x C), As(B X C), and As(B X C). The necessary calcu- 
lations for these interaction sums of squares would be as follows: 


A1(B x C) 


60)? (58) 66)? 2 

Between cells = ( 2 + ‘ 2 dpi - E m = 67.20 
184)? , (200) (884)? 

Rows - ae’) + Goo, med = 8.53 


15 15 30 


244 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 
118)? | (130) , (136)? (884) 
(118) 4 ) n y (884) 


Columns e 7 30 10 10 30 16.80 
Interaction: Ai1(B X C) = 67.20 — 8.53 — 16.80 = 41.87 
A»(B x C) 
o? (72)? 56)? (408) 
Between cells — a sr e deis ay" - DE s 176.00 
5 5 5 30 
(258)? (210)? (468)? x 
- — = .80 
mnm n" uu 30 " 
| Qe2y , 079? , (132)? (468)? ag 
Columns = ag + 10 ap 10 ^ 30 = 93.60 
Interaction: Ao(B x C) = 176.00 — 76.80 — 93.60 = 5.60 
A3(B x C) 
94 2 78 2 792 7 2 
Between cells = ( 2) + Va y seed CA = ao = 116.27 
í 5 5 30 
(272)? (224)? (496) 
R = - = 76. 
ows 15 + 15 30 76.80 
_ (172)? , (172)? , (152)? — (496)? 
Columns =o + 10 + 10 ^ 80 26.67 
Interaction: A3(B X C) = 116.27 — 76.80 — 26.67 = 12.80 
A4(B x C) 
86) ^, (84) 78) (408) 
Between cells — SS me poe D - 2 = 188.80 
_ (242)? (226)? (468)? s 
Rows etus + 15 30 = 8.53 
_ (470)? (160)? (138)? (468)? 
Columns = + 10 + i0 ^ 30 = 53.60 


Interaction: A4(B XC) = 188.80 — 8.53 — 53.60 = 126.67 


Summating the interactions of B x C under the separate A 


Sh 


COMPLEX FACTORIAL DESIGNS 245 


conditions, we have 
X: A(B x C) = 41.87 + 5.60 + 12.80 + 126.67 = 186.94 


Now we have already calculated the interaction between variables 
B and C under the combined A conditions, i.e., from the data of 
Table 47, and this will be identical with the interaction obtained 
from Sub Table V. This interaction sum of squares was found 
to be equal to 12.60. Then the second-order interaction sum of 
squares A x B X C may be obtained by subtracting the B x C 
interaction sum of squares from the sum of the interactions for 
B x C under the separate A conditions. Thus, 


Interaction: A XBXC=DA(BXC)-BXC (63) 
Substituting in the above formula, we obtain 
Interaction: A X B x C = 186.94 — 12.60 = 174.34 


which checks, within errors of rounding, with the value previously 
found for this sum of squares. 

The method described above for calculating the sum of 
squares for a second-order interaction can be varied to fit the 
needs of a particular design. For example, the interaction sum 
of squares given by formula (63) might have been obtained by 
calculating the A X C interactions under the separate B condi- 
tions, summing, and then subtracting the A X C interaction for 
the combined B conditions. Similarly, we could have calculated 
the A x B interactions under the separate C conditions, added 
these, and then subtracted the A X B interaction for the combined 
C conditions. Thus, in general, a second-order interaction will 
be given by 

Interaction: AXBXC=DA(BXC)-BXC 
-YB(AXC)-AxC 
=} C(A XB)-AXB 

and a third-order interaction by 

Interaction: A X BXC XD—X:A(BXCXD)—-BxC xD 
—-XB(AXCXD)-AXxCXxD 
-XC(AXBXxD)-AxXBXxD 
—-YXD(AXBXxC)-AXBXxC 


246  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


and a similar series of equations may be written for any higher- 
order interaction. A general proof of these equations is given by 
Edwards and Horst (1950). 

Since any given higher-order interaction may be found in 
a variety of ways, it is worthwhile to examine the data to determine 
the set of sub tables which will require the least effort as far as 
calculations are concerned. We have not, for example, taken the 
most economical set of sub tables in the present problem. The 
calculations would be reduced considerably if we had taken 
AXBXC=TC(A X B) -AX B instead of taking A X B X 
C= YXA(BXC)- BXC. The first equation would require 
only 2 sub tables, one for C(A X B) and one for C(A X B), 
whereas the second equation, as we have seen, requires 4 sub 
tables. 


5. SUMMARY OF THE ANALYSIS 


"We have assumed that the within-groups sum of squares 
has already been calculated and found equal to 1,198.00. This 
value, along with the other sums of squares which we have just 
calculated, is entered in Table 49 which summarizes the analysis. 


TABLE 49. Analysis of Variance of the 4 X 3 X 2 Factorial Design Assuming 


Replication 

nO 

Source of Sum of df Mean 

Variation Squares Square F 
Between A 235.20 3 78.40 6.28** 
Between B 86.47 2 43.24 3.46* 
Between C 76.80 1 76.80 6.15* 
Interaction: A X B 104.20 6 17.37 1.39 
Interaction: A XC 93.87 3 31.29 2.51 
Interaction: B X C 12.60 2 6.30 
Interaction: A X B X C 174.38 6 29.06 2.93* 
Within 1,198.00 .96 12.48 

Total 1,981.47 119 


——————— LLL cS CC Cc CECI 


We may assume that all the variables investigated have been 
varied systematically and therefore do nol represent random selections 
from any defined. populations. Thus, the sum of squares within 


FEN es 


^—* 


COMPLEX FACTORIAL DESIGNS 247 


groups, when divided by the 96 degrees of freedom associated 
with it, provides us with the estimate of experimental error for 
testing the significance of the other mean squares. The values 
of F which have been computed must be interpreted in terms of 
the number of degrees of freedom involved and these vary, de- 
pending upon the mean square being tested for significance. To 
simplify matters, we have starred the values of F which are sig- 
nificant at the 5 per cent point and those which are significant at 
the 1 per.cent point have been marked with a double star. The 
interpretation of the significant values of F would be as described 
in the preceding chapter. The primary purpose of illustrating 
the methods by which simple interactions, based upon more than 
a single degree of freedom, and higher-order interactions can be 
calculated has been achieved. 


6. THE USE OF INTERACTIONS AS ERROR TERMS INSTEAD OF 
THE USUAL ESTIMATES OF ERROR 


Let us assume that in the experiment described previously 
the A variable corresponds to 4 schools, the B variable corresponds 
to 3 instructors, and the C variable corresponds to 2 methods of 
instruction. Each instructor teaches both methods and in each 
of the 4 schools. We again have 4 X 3 X 2 = 24 combinations 
of variables, and we shall assume that 10 subjects have been as- 
signed at random to each combination. The analysis of variance 
would result in the following mean squares with associated degrees 
of freedom: 

Source of Variation 


Instructors 

Methods 

Schools 

Instructors X methods 
Instructors X schools 

Methods X schools 

Instructors X methods X schools 
Residual within groups 


Total 239 


w 
| E E T eS 


Let us further assume that all the mean squares are significant 
when tested against the residual mean square. This would mean, 


248 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


first with respect to the main variables: that significant differences 
are present among instructors; that the 2 methods differ sig- 
nifieantly; and that there are significant differences among schools. 
The interaction between instructors and methods, if significant, 
would mean that the differences among the instructors are de- 
pendent upon the method used or that the difference between the 
methods depends upon the instructor variable. A significant 
interaction between instructors and schools would mean that the 
differences observed among instructors are dependent upon the 
schools or that the differences observed among the schools are 
dependent upon the instructors. A significant methods-and- 
schools interaction would mean that the difference observed be- 
tween the methods is dependent upon the schools or that the 
differences among the schools are dependent upon the method of 
instruction. 

If the second-order interaction is significant, this would 
mean that the differences observed among instructors are depend- 
ent upon the methods and the schools; that the differences observed 
among the schools are dependent upon the instructors and the 
methods; or that the difference observed between the methods is 
dependent upon the schools and instructors. Now, in view of 
a significant second-order interaction, our conclusions concerning 
the main variables, consisting of schools, instructors, and methods, 
are somewhat limited. We know that there are significant differ- 
ences present for these 3 variables, but we know also, from the 
significance of the interaction, that the difference observed for, 
let us say, methods, is to some extent dependent upon the schools 
and instructors. 

If our interest is only in the 2 particular methods, the 3 
particular instructors, and the 4 particular schools involved in 
the experiment, then our analysis and the tests of significance of 
the various mean squares, using the residual mean square as an 
error term, are appropriate. Each mean square has been evalu- 
ated, and the conclusions reached are definite. Examination of 
the means for the various combinations of experimental conditions 
would probably reveal that, in a particular school, one method is 
more effective than another, when used by a particular instructor, 

and we could make recommendations accordingly. 


d 


COMPLEX FACTORIAL DESIGNS 249 


But in an experiment such as this, our primary interest may 
be in the difference observed between the 2 methods of instruction 
which we have used. Furthermore, we may wish to make recom- 
mendations beyond the particular schools investigated. Can we 
say that a particular method will probably be more effective, on 
the average, for all schools, including those we have not actually 
investigated? 

Let us suppose that we have selected the instructors to 
represent particular types or personalities or abilities. The three 
used in the experiment are definitely not a random sample from 
any defined population. Nor have we selected at random from 
any population of methods of instruction; instead we have picked 
2 particular methods for investigation. But it is possible that 
we might have made schools a random variable by selecting the 
schools at random from a defined population of schools for a par- 
ticular city, county, or school district. If this had been our in- 
tention, of course, we would undoubtedly have taken a larger 
sample than the 4 schools at hand. Let us suppose, however, 
that the schools have been selected at random. 

We now have the case mentioned earlier, where one of our 
variables may be considered a random sample from a defined 
population. In this sense the schools consist merely of replications 
of the experimental design in which the main variables are the 
instructors (varied according to type) and methods. Under this 
condition the highest-order interaction might possibly be regarded 
as the appropriate error term for testing the significance of the 
next lower order interactions. But before proceeding on this 
basis, another condition must hold true; the interaction must be 
significantly larger than the residual mean square within groups. 
It cannot, of course, be smaller except by chance. If it is smaller, 
the residual mean square within groups should be used in testing 
the significance of the next level of interactions. 

Let us assume, in the present instance, that the second- 
order interaction is significant when tested against the mean 
square within groups. We now proceed to test the next level of 
interactions against the second-order interaction. Whichever 

(A ———————— NN 

1 See p. 217, footnote 3. 


250 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


ones of these prove not to be significant when tested against the 
second-order interaction may be combined with the second-order 
interaction to give us an error term based upon a larger number of 
degrees of freedom. 

Under the assumptions that we have made, it is quite likely 
that if the second-order interaction is significant when tested 
against the residual mean square within groups, some of the simple 
interactions will not prove to be significant when tested against 
the second-order interaction. 'The obvious reason for this is that 
the mean square for the second-order interaction will be larger 
than the residual mean square within groups. The F’s thus 
obtained, besides being based upon a smaller number of degrees 
of freedom, will be smaller than in the first instance. 

Let us suppose that only the simple interaction involving 
instructors and methods is significant when tested against the 
second-order interaction. The nonsignificance of the interaction 
between methods and schools and the interaction between instruc- 
tors and schools, of course, means that we no longer have any 
basis for inferring that the difference observed between the methods 
is dependent upon the schools, or that the differences observed 
among the schools are dependent upon the methods. Similarly, 
the evidence would now indicate that the differences among the 
instructors are not dependent upon the schools, or that the differ- 
ences among the schools are not dependent upon the instructors. 
The sums of squares for these 2 interactions may be pooled with 
the sum of squares for the second-order interaction, along with 
their associated degrees of freedom. The analysis would now 


take this form: 


Source of Variation df 
Instructors 2 
Methods 1 
Schools 3 
Instruetors X methods 2 
Error (pooled interactions) 15 
Residual within groups 216 

Total 239 


Now, how shall we test the significance of the mean squares 


A 


COMPLEX FACTORIAL DESIGNS 251 


for instructors, methods, and schools? If we could assume that 
either instructors or methods constituted a random sample from 
a population of instructors or a population of methods, the in- 
structor-and-methods interaction might be considered an ap- 
propriate error term for testing the significance of the mean square 
for instructors and the mean square for methods. "This, however, 
is not a plausible assumption. The appropriate error term is the 
pooled interaction mean square based upon 15 degrees of freedom. 
It does include all the interactions involving the variable which 
we have assumed to be randomly selected, i.e., schools. If we 
now test the mean squares for instructors, methods, and schools 
against the pooled interaction mean square, and if they are sig- 
nificant, what conclusions can be drawn? 

It is the methods mean square that is of primary interest, 
and its significance would indicate that the difference between 
methods was not dependent upon, or could not be accounted for 
in terms of, differences in the schools. A similar statement could 
be made concerning the instructors if this mean square was sig- 
nificant. In view of a signifieant interaction between methods 
and instructors, however, it would still be necessary to qualify 
our recommendations; the difference between the methods is still 
dependent upon the instructors.” But the means for the various 
instruetors teaching the various methods could be examined for 
whatever insight this might give us as to the nature of the inter- 
action. 

The analysis which we have described is dependent upon 
a number of considerations, and these should perhaps be empha- 
sized once more. One condition for the use of an interaction or 
a pooled interaction mean square instead of the usual residual 
mean square within groups is that the interaction mean square 


? What if all the simple interactions had proved to be significant when 
tested against the second-order interaction? In this case, the interaction 
between methods and schools might be used to test the significance of the 
methods mean square, and the interaction between instructors and schools 
might be used to test the significance of the mean square for instructors. We 
should keep in mind that in following this procedure our interest is in being 
able to generalize concerning the methods, for example, in the population of 
Schools. 


252 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


be larger than the residual mean square. The interaction mean 
square should never be used if it is smaller than the residual mean 
square, for it can be smaller only by chance. Furthermore, and 
this is most important, it is necessary that the categories of one 
or more of the variables in the experimental design be a random 
selection from the population being sampled? In the experiment 
discussed, for example, it would be necessary for the schools used 
m the experiment to be selected at random from a defined popula- 
tion of schools. In this case the categories of the randomly selected 
variable may be regarded as replications of the experiment, and 
there is some justification for the use of the interaction as an 
error term instead of the residual mean square within groups. 
There are other experimental designs in which an interaction 
mean square is used as an error term, and these are discussed in 
the sections which follow and in subsequent chapters. 


7. FACTORIAL DESIGNS WITHOUT REPLICATION 


In some complex experiments, which involve many possible 
combinations of experimental variables and consequently many 
experimental conditions, replication is not used, and the sums of 
squares for higher-order interactions are pooled along with the 
degrees of freedom associated with them to obtain an estimate 
of experimental error. The mean square thus arrived at is used 
in the manner in which the mean square based upon the variation 
within groups has been used in the experiments described, i.e., as 
an estimate of the uncontrolled variation against which to test the 
significance of the other mean squares. 

An example of this design is to be found in an experiment 
by Crutchfield (1938), in which 5 variables were each varied in 
3 ways in an investigation of “behavior potentials.” Animals 
were placed in a pulling compartment in which was a string ar- 
ranged by pulleys to a food pan. By pulling on the string the 


3 This condition will not be met by argument after the experiment has 
been carried through to completion. For example, it would be illogical to 
argue that the 2 particular methods of instruction selected for investigation 
baye been randomly selected from a population of methods. 


k 


COMPLEX FACTORIAL DESIGNS 253 


animals could pull the food pan next to the compartment and thus 
eat. A friction device was used to increase or decrease the force 
required for pulling the food pan, and behavior was studied under 
all possible combinations of the experimental variables. 

Variable A was the length of string attached to the food 
pan, and this was varied by the use of 60-cm., 120-cm., and 240- 
cm. lengths. Variable B was the force required to pull the food 
pan in on the training trials, and this was varied by using a low, 
medium, and high setting of the friction device. Variable C 
was the number of training trials given the animals, and this was 
varied by giving 30, 60, and 90 trials. Variable D consisted of 
the number of hours between the crucial test trial and the last 
feeding period. This was varied with intervals of 12 hr., 24 hr., 
and 48 hr. The final variable E was the force required to pull 
the food pan during the crucial trials, and this was varied in the 
same way as during the training trials. 

By varying each of the 5 variables in 3 ways, a total of 
(3)(3) (3) (3) (3) = 243 combinations of the variables are possible. 
One replication of the experiment, assigning 1 animal to each 
experimental condition, would thus require a total of 243 animals. 
Each additional replication would require another 243 animals. 
Crutchfield decided to forego any additional replications and to 
use as an error term a mean square based upon the higher-order 
interactions. 

Each of the experimental variables will have 2 degrees of 
freedom, the 5 experimental variables thus accounting for a total 
of 10 degrees of freedom. The simple interactions, of which there 
will be 10, will each be based upon (2) (2) = 4 degrees of freedom, 
accounting for a total of 40 degrees of freedom. The higher-order 
interactions will account for the remainder of the 242 degrees of 
freedom. ‘Thus, for the higher-order interactions there will be 
242 — 50 = 192 degrees of freedom. The complete division of 
the degrees of freedom is as follows: 


4 
254 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH | 
Source of Variation df { 
Main variables A 2 p 
B 2 ] 
C 2 | 
D 2 | 
E 2 | 
Simple interactions AXB 4 j 
AXC 4 
AXD 4 
AXE 4 
BxC 4 | 
BXD 4 
BXE 4 
CXD 4 
CXE 4 
DXE 4 
Second-order interactions AXBXC 8 
AXBxD 8 
AXBXE 8 
AXCXD 8 
AXCXE 8 
AXDXE 8 
BXCXD 8 
BXCXE 8 
BXDXE 8 
CXDXE 8 
Third-order interactions AXBXCXD 16 
AXBXCXE 16 A; 
AXBXDXE 16 LOS. 
AXCXDXE 16 
BXCXDXE 16 
i 
Fourth-order interaction AXBXCXDXE 32 | 
$ 
8. THE ASSUMPTIONS INVOLVED IN POOLING HIGHER-ORDER | 
INTERACTIONS 
Assumptions are involved, of course, in the pooling of the. | 
sums of squares for higher-order interactions and their associated i 
degrees of freedom. In the first place, it is assumed that each of : 


the mean squares corresponding to the higher-order interactions 


COMPLEX FACTORIAL DESIGNS 255 


is an estimate of the same common variance, i.e., the assumption 
of homogeneity of variance is involved. It is also assumed that 
this common variance would not differ significantly from the 
variance obtained with replication. If the higher-order interac- 
tions are not significant—and without replication and a corre- 
sponding test of significance this must remain an assumption— 
then the mean square derived from these interactions will estimate 
the same variance as estimated by the variance within groups. 

Under these conditions, the experimental variables A, B, C, 
D, and E may be tested for significance by the mean square based 
upon the higher-order interactions. The significance of the simple 
interactions may be tested in the same manner. If none of the 
simple interactions is significant, this fact provides good evidence 
that none of the higher-order interactions will be significant and 
therefore justifies the use of the higher-order interactions as an 
error term. 

Let us suppose, however, that one of the simple interactions, 
let us say the interaction between variable A and variable B, turns 
out to be highly significant. If that is the case, then the mean 
square based upon the pooled sum of squares for all higher-order 
interactions is likely to be biased in the direction of overestimating 
the “pure” experimental error that would have been obtained from 
replication of the experiment. 

If the simple interaction between A and B is significant, we 
should then isolate the sums of squares for the second-order inter- 
actions which involved these 2 variables. These second-order 
interactions would be A x B XC,A x BxD,and A x B x E. 
These sums of squares and their associated degrees of freedom 
would be subtracted from the pooled sum of squares and degrees 
of freedom for all higher-order interactions. Since each of the 
second-order interactions is based upon 8 degrees of freedom, then 
the subtraction of the 3 second-order interactions mentioned would 
leave a residual sum of squares based upon 168 degrees of freedom. 
The significance of the 3 second-order interactions in question 
could then be tested against the residual mean square based upon 
168 degrees of freedom. 


4 The second-order interactions may be calculated in the manner de- 
scribed earlier in the chapter. 


256 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


It has been mentioned that homogeneity of variance of the 
higher-order interaction mean squares is also involved in pooling 
them to obtain a single estimate of experimental error. Each of 
the mean squares based upon a higher-order interaction might be 
found and the set tested for homogeneity by means of Bartlett’s 
test described in an earlier chapter. If the test of homogeneity 
does not result in the rejection of the hypothesis of a common 
variance, then the pooling of the various sums of Squares and 
degrees of freedom is proper. 


9. AN EXAMPLE OF AN ANALYSIS WITHOUT REPLICATION 


Although the procedure of using interactions as estimates 
of experimental error has been followed in much published research, 
we should keep in mind that there is no substitute for replication." 
If there is any a priori basis for suspecting interactions to be sig- 
nificant, a test, based upon replication, should be provided in the 
design of the experiment. If the interaction mean squares are 
significant, then their use as an estimate of the mean square that 
would have been obtained with replication, the within-groups 
mean square, may result in an underevaluation of the significance 
of the main experimental variables. 

In this connection, it may be of interest to take the data 
given earlier and to consider the sums given in Table 44 as the 
scores of single subjects assigned to each of the 24 experimental 
conditions. We may assume, that is, that the experiment was 
carried out without replication, and consequently we shall not 
have available a mean square based upon the variation of subjects 
treated alike, a mean square within groups. 


5 See Wishart (1938) for a discussion of this point. 

5 There are available techniques for increasing the accuracy of the more 
important comparisons, such as the experimental variables and the simple 
interactions, at the expense of what the experimenter may consider to be the 
less important comparisons, usually the higher-order interactions. When 
some, but nct all, information concerning a particular comparison is lost by 
the nature of the experimental design imposed upon the observations, this is 
called partial confounding. Complete loss of information is described as con- 
founding. These techniques are described in detail by Yates (19336), Goulden 
(1939), and Fisher (1942). 


COMPLEX FACTORIAL DESIGNS 257 


'The correction term for origin will now be assumed to be 
based upon the 24 single measures of Table 44, and the divisors 
for the other sums of squares will be reduced accordingly. If we 
carried out the analysis of variance under these conditions, we 
would arrive at the data of Table 50. 


TABLE 50. Analysis of Variance of the 4 X 3 X 2 Factorial Design without 


Replication 

Source of Sum of df Mean 

Variation Squares Square 
Between A 1,176.00 3 392.00 2.70 
Between B 432.35 2 216.18 1.49 
Between C 384.00 H 384.00 2.64 
Interaction: A X B 521.00 6 86.83 
Interaction: A X C 469.35 3 156.45 1.08 
Interaction: B X C 63.00 2 31.50 
Interaction: A X B X C 871.65 .6 145.28 

Total 3,917.35 23 


The sums of squares and mean squares shown in Table 50 
are not the same as those in Table 49. The values of F that are 
given, however, are identical with those that would be obtained 
from the data of Table 49, if each mean square had been tested 
against the second-order interaction mean square. For example, 
dividing the mean square for the A variable by the A X B X C 
interaction mean square in Table 50 gives 392.00/145.28 = 2.70. 
Similarly, in Table 49 the ratio of these same 2 mean squares is 
78.40/29.06 = 2.70. 

The assumption in this design is that the second-order inter- 
action A X B X C is not significant and is, therefore, to be used 
as an estimate of the mean square that would have been obtained 
with replication. By reference to the table of F, we find that 
none of the mean squares is significant when tested against the 
triple-interaction mean square. We would thus be led to the 
false conclusion, in this analysis, that none of the main variables 
is significant, whereas the previous analysis indicated that sig- 
nificant differences did exist. 


258 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


10. TESTING THE INTERACTION MEAN SQUARES FOR HOMO- 
GENEITY OF VARIANCE 


Since the simple interactions in the last analysis did not 
prove to be significant when tested against the triple interaction— 
only the A x C mean square giving a ratio greater than 1.00—the 
question of homogeneity of variance of all the interaction mean 
squares may be raised. If this hypothesis proves to be tenable, 
then we may improve our estimate of experimental error by 
pooling the sums of squares for all interactions along with the 
corresponding degrees of freedom. From the data of Table 50, 
we would expect this pooled value to be somewhat less than the 
mean square for the triple interaction. But also important, along 
with the decrease in the value of the pooled mean square, would 
be the increase in degrees of freedom for the test of significance 
from 6 to 17. With a larger number of degrees of freedom for 
our error mean square, a smaller value of F will meet the require- 
ments of statistical significance. 

The hypothesis of homogeneity of variance may be tested 
in the manner already familiar. The essential calculations are 
given in Table 51. We see that the uncorrected value of x? is 


TABLE 51. Test of Homogeneity of Variance of the Interaction Mean Squares 
of the 4 X 8 X 2 Factorial Design à 


Source df xz s? logs? (df) (log s?) 
Interaction: A X B 6 521.00 86.83 1.93867 11.63202 
Interaction: A X C 3 469.35 156.45 2.19424 6.58272 
Interaction: B X C 2 63.00 31.50 1.49831 2.99662 
Interaction: A X B X C 6 871.65 145.28 2.16227 12.97362 

Sum 17 1,925.00 7.79349 34.18498 
Computations: 
1. = = 113.2; log 113.2 = 2.05385 


2. (17) (2.05385) = 34.91545 
3. 34.91545 — 34.18498 = .73047 
4x= (2.3026) (.73047) = 1.682 


COMPLEX FACTORIAL DESIGNS 259 


equal to 1.682 and this is not significant for 3 degrees of freedom. 
There is no need to compute the corrected value for x”, as this 
will be even smaller than the uncorrected value. The hypothesis 
of homogeneity of variance is tenable, and consequently we may 
take as our estimate of error a mean square based upon the pooled 
interactions. We thus arrive at the analysis shown in Table 52. 


TABLE 52. Analysis of Variance of the 4 X 3 X 2 Factorial Design Using the 
Pooled Interactions as the Error Term 


Source of Sum of df Mean 
Variation Squares Square 
Between A 1,176.00 3 392.00 3.46* 
Between B 432.35 2 216.18 1.91 
Between C 384.00 1 384.00 3.39 
Pooled interactions 1,925.00 17 118.24 
Total 3,917.35 23 


Although the values of F are now somewhat greater than 
previously and a smaller value of F will meet the requirements of 
significance because of the greater number of degrees of freedom 
for the error mean square, only one of the F ratios is significant. 


11. SUMMARY 


The lesson to be learned in comparing the analyses with and 
without replication is plain. If there is any reason to suspect 
that the higher-order interactions will be significant, replication 
should be seriously considered in the design of the experiment. 
With a small number of combinations of experimental conditions, 
replication is, of course, no problem. With as many as 50 different 
combinations, each additional replication will require 50 additional 
subjects. But even 1 additional replication would give 50 degrees 
of freedom for the measure of experimental error based upon the 
variation within groups. 

If a large number of combinations of experimental variables 
are involved in a particular problem, as in Crutchfield’s experi- 
ment where 243 combinations were involved, replication is, of 
course, costly. When an extremely large number of combinations 


260 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


of variables are being studied, it is possible that a sufficient number 
of the higher-order mean squares will prove to contribute but 
little to the over-all variation. These might then be tested for 
homogeneity of variance, and, if the results of the test are satis- 
factory, the interactions may be pooled to arrive at an estimate 
of experimental error which is but little inflated over what the 
estimate might have been with replication. 


12. EXAMPLES 


1. If an experiment involves variable A, which is varied in 
3 ways, variable B, which is varied in 2 ways, variable C, which is 
varied in 2 ways, and variable D, which is varied in 3 ways, and if 
the experiment is to be of factorial design with 5 subjects assigned 
to each of the experimental conditions, set up the summary table 
showing the sources of variation and the number of degrees of 
freedom associated with each. 

2. Assume that an experiment of factorial design was carried 
out with type size varied in 3 ways and type face varied in 4 ways. 
Five subjects were assigned at random to each of the 12 experi- 
mental conditions. Reading scores for the subjects are as shown 
in the table. Find the values of F for type size, type face, and 
the interaction. 


Type Face Type Size 
Size 1 Size 2 Size 3 
38 54 65 
45 34 86 
Type A 22 54 62 
23 23 26 
45 32 42 
Sum 173 197 281 
24 21 35 
43 67 45 
Type B 56 98 76 
75 46 89 
43 55 98 
Sum 241 287 343 


COMPLEX FACTORIAL DESIGNS 261 


36 35 35 

81 36 65 

Type C 22 54 67 
23 65 76 

45 78 55 

Sum 207 268 298 

45 45 34 

55 98 65 

Type D 34 65 65 
34 34 43 

45 54 36 

Sum 213 296 243 


3. Child (1946) designed an experiment to test the hypothesis 
that preference for a more distant goal object, when found, is 
a result of experience in previous situations. “The experiment 
was planned so that if this assumption was correct, certain in- 
fluences of previous learning would be exhibited” (p. 3). The 
variables introduced were as follows: the sex of the children used 
as subjects in the experiment; the sex of the experimenter present 
during the test situation; the nature of the barrier introduced be- 
tween the subject and the distant goal object; and the type of 
instructions given to the child (pp. 3-4). “The basic technique 
of these experiments was to place children in the position of having 
to choose between two desirable goals, one of which was more 
accessible than the other, and to observe their reactions" (p. 5). 
Since each variable was varied in 2 ways, we have a total of 16 
experimental conditions. Subjects were school children in grades 
lthrough 7. They were divided into groups of 34 to 45 Subjects 
each. The data are given in terms of the per cent choosing the 
more distant goal. Child states that the percentages are “close 
enough to 50, to suggest an adequate approximation to the as- 
sumption of normal distribution of sampling errors" (pp. 18-19). 
The analysis of variance was also applied, however, making use 
of the angular transformation of Bliss (see p. 203) for percentages 
of occurrences. The values of F obtained with the angular trans- 
formation were slightly different, but no conclusions concerning 
Significance were changed by the analysis of the transformed data. 
Since none of the interaction mean Squares was significant when 


262 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


tested against the pooled mean squares of the other interactions, 
the pooled mean square for all interactions was used as the error 
term. The data are as follows: 


Per Cent of Subjects Choosing More Distant Goal under Various Com- 
binations of Experimental Variables 


Male Subjects Female Subjects 


Cued Noncued Cued Noncued 
Instructions Instructions Instructions Instructions 


Male experimenter 


Table barrier 43 36 13 21 

Ladder barrier 40 50 24 32 
Female experimenter 

Table barrier 33 41 39 30 

Ladder barrier 55 46 37 43 


(a) Compute the sums of squares for all main effects and 
interactions, (b) You will readily observe that no one of the inter- 
action mean squares is significant if tested against the third-order 
interaction mean square. This will also be true if each interaction 
mean square is tested against the pooled value for the remaining 
interactions, (c) Test all the interaction mean squares for homo- 
geneity by applying Bartlett’s test using x”. (d) Use the pooled 
mean square for all interactions to test the significance of the 
main effects, 

4. An experiment might have been designed to study the 
influence of 2 drugs in 3 different concentrations upon performance 
such as steadiness, arithmetic operations, letter cancellations, ete. . 
Let us call the drugs A and B and indicate the concentrations hy 
1, 2, and 3. The experiment is of factorial design with 10 subjects 
assigned to each condition. The results of the hypothetical 
experiment are given on the following page. 

(a) Apply Bartlett’s test for homogeneity of variance. (b) 
Analyze the data by the methods of the analysis of variance and 
find the various values of F. (c) How would you interpret the 
results of the experiment? 


EN 


COMPLEX FACTORIAL DESIGNS 263 


Drug A Drug B 
Concentration Concentration 
12 8 L 2 $ 
19 43 53 30 49 64 
10 42 51 24 43 61 
21 41 57 25 49 68 
15 44 57 30 53 56 
20 42 68 28 44 60 
24 49 60 34 46 55 
16 46 48 31 46 54 
22 39 47 32 56 68 
18 48 60 27 54 59 
18 39 60 32 53 57 


5. Consider another experiment in which variable A is 
varied in 4 ways, variable B is varied in 3 ways, variable C is 
varied in 3 ways, and variable D is varied in 2 ways. Let us 
suppose that the experimenter foregoes additional replications and 
assigns but one animal to each experimental condition. Set up 
the summary table showing the sources of variation and the degrees 
of freedom associated with each. 

6. The following data are for practice in computation. (a) 
Compute the various sums of squares for the main effects and 
interactions. (b) Test the interaction mean Squares for homo- 
geneity. 


A B Cc 

x 40 60 70 

I Y 60 20 20 
Z 50 90 50 

x 30 60 60 

II Y 60 10 60 
Z 20 90 10 


CHAPTER 14 | 


Experimental Designs Involving 
Matched Groups 


1. AN EXPERIMENT DESCRIBED 


An investigator is interested in the influence of noise upon 
simple mental processes such as might be involved in solving | 
arithmetic problems. Subjects are assigned at random to 2 
groups. One group is then assigned to an experimental condition 
where they will solve problems in a situation where noise prevails 
and the other will perform under conditions of quiet. The group 
working under the condition of quiet we may call the control group, 
and the group working under the condition of noise we may call 
the experimental group. The outcomes of the experiment may be 
recorded in terms of the number of problems correctly solved dur- 
ing the experimental periods or the speed of solving. 

The mean performance score for each group could be found 
and the difference between these 2 means could be tested for 
significance by the ¢ or F tests. Let us suppose that the difference 
between the means is fairly large, but that the test of significance 
fails to reject the hypothesis of random sampling from a common 
population. The reason for this is that the denominator of the t 
ratio is large. 


2. A CHANGE IN THE DESIGN 


We may now raise the question as to whether the variation in 
performance of the subjects under the test conditions cannot be 
attributed partly to the variation of the subjects in initial ability, 
i.e., individual differences present prior to the experimental test. 
Such differences, if present, would not account for the superiority 
of performance of subjects, let us say, under the condition of quiet, 

264 — 


EXPERIMENTAL DESIGNS INVOLVING MATCHED GROUPS 265 


for, if we have assigned the subjects at random to the 2 experi- 
mental conditions, we would expect the 2 groups to be equated with 
respect to initial differences in ability within the limits of random 
assignment. But it should be obvious that such differences in 
initial ability, if related to performance under the test conditions, 
would contribute to the variation within groups. This means that 
the within-groups sum of squares would be inflated over what it 
might be if the subjects within each group were more nearly equal 
with respect to initial ability. 

Since we are not assuming perfect correlation between initial 
performance and performance under the test conditions, even if we 
had subjects exactly equal in initial ability, there would still be 
some variability present within the groups under the test condi- 
tions, but the variability would not be as great as it would be with 
subjects of widely varying levels of initial ability. This might 
indicate that if we tested subjects prior to the experiment and 
then used only those with approximately the same level of ability, 
a significant difference between the means of the 2 test conditions 
would be found. 

A major objection to this procedure would be the difficulty of 
obtaining subjects with the same level of initial ability. Further- 
more, any conclusions drawn from the experiment would be limited 
to the particular level of ability investigated. There is, however, 
an alternative to this problem. If it is possible statistically to 
isolate the variation (sum of squares) attributable to differences in 
initial ability, this could be removed from the total variation. 
Since we have already pointed to reasons for not expecting this 
variation to be related to the variation in mean performance of the 
2 groups as long as random assignment is followed, the logical 
expectation, then, is that it should result in the reduction of the 
variation within groups (within-groups sum of squares), 

In the experiment mentioned we might give all subjects a 
practice period prior to the experiment proper to determine the 
levels of initial ability of the subjects. We could then rank the 
subjects in order of initial ability and take the 2 highest subjects 
and assign one at random to one of the experimental conditions and 
the other to the second experimental condition. Similarly, we 
could assign the members of the next pair of subjects, the second 


2660  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


next pair, and so on, to the 2 conditions of the experiment. With 
this method of assignment, we would expect the means and the 
variances of the 2 groups to be approximately the same with 
respect to initial ability. 'The subjects are now tested under the 
experimental conditions, and the outcomes or scores of the groups, 

assuming a total N of 40, are given in Table 53. 
TABLE 53. Performance Scores of Pairs of Subjects, Matched on the Basis of 

Initial Ability, One Member of Each Pair Being Tested 
under a Condition of Noise and the 
Other under a Condition of Quiet 

— 


po Control Experimental 
at Group— : Group— _ Sum Difference 
Ability Quiet Condition Noise Condition 
Xi Xs Quo X) (X-X) 
1 14 14 28 0 
2 14 12 26 2 
3 12 11 23 1 
4 12 11 23 1 
5 11 9 20 2 
6 1l 10 21 T 
7 10 9 19 1 
8 11 10 21 1 
9 9 9 18 0 
10 10 9 19 1 
11 8 9 17 aĵ 
12 10 9 19 1 
13 8 8 16 0 
14 7 8 15 c 
15 8 8 16 0 
16 7 6 13 1 
17 4 1 5 3 
18 5 2 7 3 
19 5 4 9 1 
20 _4 e 5 3 
Sum 180 160 340 20 


3. ANALYSIS OF THE DATA 
Let us neglect, for the moment, the method by which the 
subjects have been assigned to the 2 groups and apply a two-part 


EXPERIMENTAL DESIGNS INVOLVING MATCHED GROUPS 207 


analysis of variance to the data of Table 53. The sums of squares 
would be given by 


2 2 2 " 340)? 
Total = (14)? + (14)? + (12)? +--+ Q0) - UR = 424.0 
: |. (80 — 160)? 
Between — “aeo (20) = 10.0 
Within = total — between = 424.0 — 10.0 = 414.0 


The summary of the analysis is given in Table 54, and we 
observe that the mean square between groups is approximately the 
same as the mean square within groups and consequently the 


TABLE 54. T'wo-Part Analysis of Variance of the Performance Scores of Matched 
Pairs of Subjects 


Source of Sum of df Mean F 
Variation Squares Square 
Between groups 10.00 1 10.000 
Within groups 414.00 38 10.895 
Total 424.00 39 


Ånn 


hypothesis of random sampling from a common population will be 
regarded as tenable. 

If we now take into consideration the fact that the measure- 
ments in each row represent the performance of 2 subjects of com- 
parable levels of initial ability in the 2 test conditions, we have a 
logical basis for computing a sum of squares between rows. This 
sum of squares is found in the same manner as that by which we 
compute the sum of squares between groups. We square the sums 
of the various rows, divide by the number of observations upon 
which the sum is based, summate these values, and subtract the 
correction term for origin. Thus, 


Qs? (26)? , (237, (Y? _ (340)? | 
a sis i3 CH" mi 


The sum of squares calculated above represents the varia- 
tion in mean performance of subjects of different levels of initial 


268 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


ability. 'The number of degrees of freedom for this sum of Squares 
will be 1 less than the number of means (or sums) involved. The 
degrees of freedom will be, therefore, 20 — 1 — 19. Now subtract 
this sum of squares from the sum of squares within groups pre- 
viously calculated, and we shall be left with a remainder or residual 
sum of squares. Thus, 
Within groups — between rows (pairs) = residual 
414.0 m 401.0 = 13.0 


Similarly, for the degrees of freedom, we have 


Within groups — between rows (pairs) — residual 
38 — 19 = 19 


The residual sum of squares represents the remaining varia- 
tion that cannot be accounted for in terms of the variation between 
the column means (experimental conditions) and the row means 
(initial levels of ability). It may be noted that if we compute the 
sum of squares between groups (columns) and the sum of squares 
between pairs (rows), then the residual sum of Squares may be 
obtained without the necessity of caleulating the sum of squares 
within groups. Thus, 


Total — between groups (columns) — between pairs (rows) — residual 
424.0 — 10.0 - 401.0 = 13.0 


. The summary of this analysis is shown in Table 55. It is 


TABLE 55. T'hree-Part Analysis of Variance of the Performance 
Scores of Matched Pairs of Subjects 


Source of Sum of df Mean 
Variation Squares Square F 
Between groups (columns) 10.00 1 10.000 14.620 
Between pairs (rows) 401.00 19 21.105 30.855 
Residual (error) 13.00 19 .684 
Total 424.00 39 


———————————— 


obvious from the data that the variation between pairs of su 


Z bjects 
accounts for a great part of the variation measured by the s 


um of 


EXPERIMENTAL DESIGNS INVOLVING MATCHED GROUPS 269 


squares within groups. When this source of variation is statis- 
tically isolated, the residual sum of squares within groups is only 
13.0. Testing the significance of the mean square for pairs of sub- 
jects against the residual mean square, we find that the obtained 
value of F is highly significant for 19 and 19 degrees of freedom. 
We would expect, of course, to find that there are significant 
differences present among the row means zf level of initial ability 
is positively related to performance under the test conditions. 

The major factor of interest, however, is the mean square 
between groups. When tested for significance against the residual 
mean square, this is also highly significant, although the test is now 
based upon 1 and 19 degrees of freedom instead of 1 and 38 as in 
thepreviousanalysis. Despite the loss of the additional 19 degrees 
of freedom for the sum of squares between pairs of subjects, the 
reduction in the size of our error term more than offsets the loss of 
the degrees of freedom. 


4. THE NATURE OF THE RESIDUAL SUM OF SQUARES 


What is the nature of the residual sum of squares which we 
have used as our estimate of experimental error in this analysis? 
A careful examination of Table 53 and the method of classification 
of the data will reveal that the residual sum of squares corresponds 
exactly to the interaction between the row variable and the column 
variable or, in other words, to the interaction between level of initial 
ability and the experimental conditions. 

We may consider the data of Table 53 as consisting of the 
outcomes of an experiment where level of initial ability is varied in 
20 ways and the experimental variable is varied in 2 ways. Witha 
factorial design, this would require (2) (20) — 40 subjects for one 
replication. Since we have precisely 40 subjects, it is clear that 
the total of 39 degrees of freedom will be accounted for in the 
following way: Between experimental conditions will take 1 degree 
of freedom; between levels of initial ability will take 19 degrees of 
freedom; interaction between the experimental conditions and 
level of initial ability will take (1) (19) — 19 degrees of freedom. 
The fact that we have not replicated within the various levels of 
ability means, of course, that we are not able to compute a sum of 
squares within the cells of the table, for we have but a single 


270  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


observation in each cell. Hence, we have no sum of squares within 
cells, as we would have with a factorial design with replication, and 
our estimate of experimental error must be based upon the interac- 
tion sum of squares. 

It may be recalled that we have previously discussed (pp. 
247-252) the conditions under which interaction mean squares may 
be used as estimates of experimental error. One of these was 
that the categories of one or more of the interacting variables be 
a random sampling from some defined population. In designs of 
the nature described in this chapter, the categories of the row 
variable are assumed to meet this requirement. Thus, it is as- 
sumed, in the present problem, that the various levels of ability 
represent a randomly selected sample from a defined population 
of levels of ability. 


5. THE EFFICIENCY OF DESIGNS INVOLVING MATCHED GROUPS 


An examination of the scores for the pairs of subjects in 
Table 53 will reveal why the sum of squares for subjects of the same 
level of initial ability accounts for such a great part of the variation 
within groups. It may be seen that each pair of subjects tends to 
react in relatively the same way under the experimental conditions, 
There is, in other words, a high degree of positive correlation be- 
tween the pairs of scores, as measured by the correlation coefficient 
which is .93. The example cited is purely fictitious, and the data 
were so manipulated as to make the correlation as high as this for 
purposes of illustration. It is extremely unlikely that in actual 
experiments correlations as high as this would be observed. The 
correlations to be expected from experimental data are more likely 
to be in the neighborhood of .30 to .50, or perhaps lower. 

The residual (interaction) mean square is dependent upon 
the degree of correlation present between the columns of scores. 
It can be proved that the residual mean square will be less than the 
mean square within columns when the intercorrelations are in 
general positive and greater when they are in general negative! 

If the intercorrelations are zero, then the residual mean square 

poc ac eee et 


1 Strictly speaking, it is only if the average of the covariances is positive 
r negative that these conditions can be established rigorously. 
[o 


Y 


| 


EXPERIMENTAL DESIGNS INVOLVING MATCHED GROUPS 271 


will be equal to the mean square within groups in the two-part 
analysis, and the efficiency of the design would actually be less than 
if the subjects had not been matched. The reason for the loss in 
efficiency would be that the test of significance of the experimental 
variable, to take the case at hand, would be based upon 1 and 19 
degrees of freedom rather than 1 and 38 degrees of freedom which 
would be available if the subjects had not been matched. For1 
and 19 degrees of freedom, a value of F equal to 8.18 is required for 
significance at the 1 per cent point, whereas for 1 and 38 degrees of 
freedom a value of F equal to 7.35 would be regarded as significant. 

'The point made above can be illustrated with the data of 
Table 56. Assume that the subjects have been divided into 5 
levels upon the basis of an initial test and that they are, therefore, 
equated across rows on the basis of this variable. The subjects 
within each level are then assigned at random to the experimental 
conditions designated by the letters A, B, C, and D. The per- 
formance scores of the subjects under the experimental conditions 
are given in Table 56. 


TABLE 56. Scores of à Groups of Subjects Tested under Different Experi- 
mental Conditions—Subjects Equated Across Rows on the Basis 
of a T'est of Initial Performance 


Initial Test Experimental Conditions 


Sum 

Level A B C D 

1 5 2 7 4 18 

2 4 3 6 3 16 

3 3 4 5 2 14 

4 2 5 4 1 12 

5 1 6 3 0 10 
Sum 15 20 25 10 70 


Ignoring the fact that the subjects have been matched across 
rows, let us make the usual two-part analysis of the total sum of 
squares. Then we have 


Total = (5)? + (4)? + (8)? +: +--+ (0)? — ar = 65.0 


272  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


2 2 ~\2 2 2 
Between columns = (15) 4- (20) 3 e + Q0) (70) 
$ 5 5 5 20 


25.0 


Within columns — total — between — 65.0 — 25.0 = 40.0 


The summary of this analysis is given in Table 57. Note 
that the mean square within groups is 2.5 and that F = 8.3/2.5 = 
3.32 is based upon 3 and 16 degrees of freedom. The obtained 
value of F slightly exceeds the tabled value 3.24 for 3 and 16 
degrees of freedom and may be regarded as significant at the 5 per 
cent point. 


TABLE 57. Analysis of Variance of the Scores of 6 Groups of Subjects 
Tested under Different Experimental Conditions with the 
Matching Variable Ignored 


i 
Source of Variation Sum of Squares df Mean Square F 
Between columns 25.0 3 8.3 3.32 
Within columns 40.0 16 2.5 

Total ` 65.0 19 


aeee 


Now let us take into account the fact that the subjects were 
matched across rows and obtain the sum of squares for the row 
variable. This will be given by 


2 2 2 2 
8) ES (6) + a4) a (12) 


Between rows — 4 4 Fi í 
(10)? (70 _ 
gi en 10.0 


Then the residual sum of squares (interaction) may be obtained by 
subtracting the sum of squares for rows and the sum of squares for 
columns from the total sum of squares. Thus, 


Residual = 65.0 — 10.0 — 25.0 = 30.0 


The summary of this analysis is given in Table 58. Note that 
the residual mean square is 2.5, the same value that we obtained in 
the previous analysis, and that the value of F 8.3/ 2.5 = 3.32 is 
unchanged. But now we enter the table of F with 3 and 12 


—— 


= 


EXPERIMENTAL DESIGNS INVOLVING MATCHED GROUPS 273 


degrees of freedom instead of the 3 and 16 we had available in the 
previous analysis. We find that a value of F equal to 3.49 will be 
required for significance at the 5 per cent point and hence our 
obtained value is not significant according to this standard. 


TABLE 58. Analysis of Variance of the Scores of 5 Groups of Subjects 
Tested under Different Experimental Conditions—Groups Equated 
on the Basis of an Initial Test 


——— 


Source of Variation Sum of Squares df Mean Square F 
Between columns 25.0 3 8.3 3.32 
Between rows 10.0 4 2.5 1.00 
Residual (error) 30.0 12 2.5 

"Total 65.0 19 


———————————————————— 


The two analyses of the data of Table 56 illustrate one of the 
points made concerning the efficiency of the matched subject 
design. It may be determined, for example, that the average inter- 
correlation of the columns of Table 56 is zero. 'Thus, while the 
value of F remains unchanged the loss in degrees of freedom for 
evaluating F makes the matched group design less efficient than if 
the subjects had not been matched. In general, it can be said that 
positive correlation between the columns will result in a residual 
mean square that is smaller than the mean square within groups 
of the usual two-part analysis. But if the design using matched 
subjects across rows is to be more efficient than the two-part 
analysis, the correlation must be sufficiently high as to offset the 
loss in degrees of freedom when the table of F is entered. 


6. THE MATCHING VARIABLES 


In some cases it will be possible to obtain an initial measure, 
prior to the experiment proper, on the subjects in an experiment 
with the same instrument which is to be used to measure the out- 
comes of the experiment. These initial measures might then be 
used to match subjects across the rows. In other cases where it is 
not practical to obtain an initial measure on the same instrument, 
it may still be possible to match subjects on the basis of some other 
variable which we have reason to believe will be positively cor- 


274  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


related with measures obtained under the experimental conditions. 
The variables used for matching purposes will depend, of course, 
upon the kinds of measurements to be made in the experimental 
situations. If we were studying the influence of various diets 
upon gain in weight, we might equate subjects across rows upon the 
basis of initial weights. In other experiments, subjects might be 
equated on the basis of degree of education, income level, attitudes, 
test performance, age, and so forth. Here again, the extent to 
which the performance of subjects of the same level on the match- 
ing variable turns out to be more homogeneous under the various 
experimental conditions than the performance of unselected sub- 
jects of varying levels will determine the efficiency of the design. 


7. AN ANALYSIS OF SEVERAL MATCHED GROUPS 


As an illustration of the increased efficiency resulting from ~ 


taking into account a matching variable, let us take an experiment 
on the influence upon weight of various amounts of a vitamin 
introduced into a diet. Suppose that we have 25 rats of the same 
age and sex and that they are divided into 5 levels of initial weight 
at the beginning of the experiment. Five different dosages of the 
vitamin are to be investigated, and a rat from each initial weight 
level is assigned at random to one of the treatments. The rats 
across rows have thus been matched upon the basis of initial 
weight. The outcomes of this hypothetical experiment are given 
in Table 59. The analysis of the sums of squares would be as 
before. Thus, 


TABLE 59. Gains in Weight of 5 Groups of Animals Given Different Diets— 
Animals Equated across Rows on the Basis of Initial Weights 


Initial Weight Experimental Conditions 
Sum 
Level 1 2 3 4 5 
1 18 20 20 21 21 100 
2 17 19 19 20 20 95 
3 16 17 18 19 20 90 
4 16 16 17 18 18 85 
5 16 16 15 17 16 80 
Sum 83 88 89 95 95 450 


t 


EXPERIMENTAL DESIGNS INVOLVING MATCHED GROUPS 275 


(450)? 


Total = (18)?+(17)?+(16)?+ --- + (16)? — z^ ~B 
(100)? (905)? (90)? (85)?, (80)? (450) 

Rows = 5 $ 5 d 5 + 5 F 5 25 = 50.00 
(83), (88)? , (89)? , (95)? , (95)? (450? — 
Columns— 5 + 5 x 5 5 T B TEE = 20.80 
Residual (interaction) = 78.00 — 50.00 — 20.80 = 7:90 


The summary of our analysis is given in Table 60. The mean 


TABLE 60. Analysis of Variance of Gains in Weight of 5 Groups of Animals 
Given Different Diets—Groups Matched on the Basis 
of Initial Weights 


Source of Variation Sum of Squares df Mean Square F 
Between rows 50.0 4 12.50 27.78 
Between columns 20.8 4 5.20 11.56 
Residual (error) 7.2 16 45 

Total 78.0 24 


square for the experimental conditions (columns) is significant 
when tested against the residual mean square. It may be ob- 
served that if the row sum of squares had not been removed from 
the term we used as error, we would have for the sum of squares 
within groups 50.0 + 7.2 = 57.2, and this value, based upon 20 
degrees of freedom, would give as a mean square within groups a 
value of 2.86. If the mean square for the experimental conditions 
was tested against this mean square, we would thus have F = 
5.20/2.86 = 1.82. This value does not meet the requirements of 
significance for 4 and 20 degrees of freedom. 

The results of the matching across rows in this, of course, 
purely hypothetical example are extremely efficient, the row 
variation accounting for a good portion of the variation within 
groups. By taking this source of variation into account in the 
design, the error term is reduced considerably. The reduction is 
sufficiently great so that the test of significance of the experimental 


276 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


variable results in a significant value of F, whereas without the 
isolation of the row sum of squares, the value of F would not be 
significant. Actual experiments, of course, may not turn out this 
happily. 


8. THE t TEST APPLIED TO 2 MATCHED GROUPS 


In the case of an experimental design which involves but 2 
groups of paired observations, the evaluation of the experiment 
could be made by means of the t test instead of the F test.  Identi- 
cal results would be obtained, as they should be, but we have 
discussed the F test first because it is the more generalized method 
of analysis applicable to the case of several equated or matched 
groups. Since many published studies, however, make use of the 
t test instead of the F test in the case of 2 groups, this method will 
be discussed briefly. 

We may take the data of Table 53, which we have already 
treated by means of the analysis of variance. Consider the dis- 
tribution of differences between the pairs of observations 
X; — Xo = X. We could find the mean of this distribution of 
differences and the variance of the distribution. Then the 
variance of the mean of the differences would be 1/n the variance 
of the individual differences. The square root of the variance of 
the mean of the differences would of course be the standard error of 
the mean of the differences. The difference between the means 
divided by the standard error of the difference will give us the ¢ 
ratio. Now it is a simple matter of algebra to show that the 
standard error of the difference between the means, when observa- 
tions have been paired and when m; = ng = n, will be given by 


x 2 
Sa = dene (64) 


The value of } z? in formula (64) above is based upon the 
deviations of the differences X; — X5 — X from the mean of the 
distribution of differences. It may be obtained in the usual way, 
from column 5 of Table 53. Thus, . i 

(20)? 


He = OP BPE OP eo BP =i 


pe 


FX of 


wt 


1 


EXPERIMENTAL DESIGNS INVOLVING MATCHED GROUPS 277 


Then substituting in formula (64), we obtain 


[36 Ju 
8-2 = 00-07 008421 = .2616 


Under the hypothesis of random sampling from a common 
population, the population mean difference will be equal to zero. 
Then the deviation of the observed mean difference, which is equal 
to 1.0, may be tested as a random deviation from the population 
value. Thus, we have 


(¥, —X2) -0 10 
= = 3.82 
Si, .2616 3 


The number of degrees of freedom for this ¢ will be, as usual, equal 
to the number of independent observations. We have 20 differ- 
ences, and thus we shall have 20 — 1 or 19 degrees of freedom for 
evaluating the t. From the table of ¢ we find that our obtained 
value of 3.823 is highly significant. The null hypothesis will be 
rejected. We may note that this is exactly the same conclusion. 
which we arrived at by means of the F test under the condi- 
tions described. When only 2 groups of paired observations are 
involved and the analysis of variance takes the form described in 
this chapter, then V/ will be equal to f, or P =F. Squaring t, 
we have (3.823)? = 14.6153, and this, within limits of rounding 
errors, is equal to the F of 14.620 which we found earlier. 

The point that we made before concerning the efficiency of the 
experimental designs involving matching as being dependent upon 
the degree of correlation present between the pairs of observations 
can be made clearer by an alternate method of expressing formula 
(64). It can be shown that formula (64) is identical with 


Sž: = Sa.” + Si," — rs; $182 (65) 
where rz,z, may be taken as the correlation coefficient, between the 
pairs of observations. Obviously, if the correlation were equal to 
zero, the standard error of the difference between the means would 
not be reduced from what it would be if we had not made use of 
matching at all. 

Formula (65) also emphasizes the fact that if the subjects 


278  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


are so matched that the correlation between performance of the 
pairs of subjects under the experimental condition is negative, then 
the standard error of the difference between the means is going to 
be larger than it would be if matching had not occurred. The 
analogous situation with the analysis of variance is that the 
residual mean square (interaction between the matching variable 
and performance) will also be larger than the mean square within 
groups.? 

'The moralis plain. If matching is to be effective in reducing 
the term used to evaluate the difference between experimental 
conditions, positive correlation between the performance of sub- 
jects matched will be necessary under the experimental conditions. 

Within the limitations noted, an experimental design which 
involves matching is a useful one and one which may be applied to 
many psychological and educational experiments. 


9. THE IMPORTANCE OF CONSIDERING POSSIBLE INTER- 
ACTIONS 


In experimental designs involving matched groups, such as 
discussed in this chapter, the experimenter’s primary concern is 
the significance of the experimental variable. The matching 
variable is introduced because the experimenter wishes to obtain a 
smaller (residual) mean square for testing the significance of the 
mean square for the experimental conditions than would be the 
case with an ordinary two-part analysis. He matches subjects 
because he has some prior reason for believing that the performance 
of matched subjects under the experimental conditions will be 
positively correlated. 

There are many experiments, however, where the presence of 
a significant interaction would be a finding of considerable im- 
portance. In an experiment upon the effectiveness of various 
methods of instruction, for example, subjects might be given an 
intelligence test and then equated across rows upon the basis of the 
test scores. Several methods of instruction might constitute the 
experimental variable. In the analysis of the results, the inves- 


? An illustration of this situation is given in Example 4 at the end of 
this chapter. 


ae 
| 
kA 


EXPERIMENTAL DESIGNS INVOLVING MATCHED GROUPS 279 


tigator would isolate the row sum of squares and remove it from 
the within-groups (columns) sum of squares. He would thus be 
using an interaction mean square for his estimate of experimental 
error. But it may happen that a given method of instruction 
being investigated is not equally effective with all ranges of intel- 
ligence. This is to say that one particular method of instruction 
may be very effective with subjects of a high level of intelligence 
and, at the same time, not be effective at all with subjects of lower 
intelligence. Another method might prove to work very well with 
subjects of lower intelligence, but not as well with subjects of higher 
intelligence. The interaction between the 2 variables, method of 
instruction and level of intelligence, may very well prove to be 
significant, but there is no way of testing this in the experimental 
design used. 

The interaction mean square may be used to test the sig- 
nificance of the mean square for methods of instruction, and, if the 
value of F is significant, it is possible for the experimenter to 
generalize concerning the methods of instruction over the range of 
intelligence investigated. Significant differences among the 
methods of instruction may be present even when the interaction 
is significant. But it would obviously be of educational impor- 
tance if it were found that one method of instruction is more 
effective with one level of intelligence and another method more 
effective with other levels of intelligence. This is what a significant 
interaction might mean, and unless the possibility of the presence 
and importance of the interaction is considered in the design of the 
experiment, it will not be possible to make the desired test of 
significance. 

Let us consider another example. Suppose that we gave a 
group of subjects an attitude test and on the basis of initial scores 
equated them across rows. The experimental variable, let us say, 
consists of different methods of presenting propaganda, and we are 
interested in the shifts of attitudes under the various conditions of 
presentation. The sum of squares between rows would be isolated 
and removed from the sum of squares within columns (within 
groups). The estimate of error would thus be the interaction mean 
square. Let us assume that the mean square between methods of 
presentation turns out to be significant when tested against the 


280 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


interaction mean square. Now the experimenter might rest 
content in his knowledge that the methods of presentation result 
in significant differences in attitudes, but it might also be of con- 
siderable psychological importance if a significant interaction 
existed between the initial attitude of the subjects and the method 
of presentation of the propaganda. 

We may think of the subjects from row to row as differing in 
terms of the method of initial assignment, in terms of intensities of 
attitude toward a given issue, ranging from strongly favorable to 
strongly unfavorable. A significant interaction may indicate that 
one method of propaganda presentation is more effective with sub- 
jects who have favorable attitudes and another method as being 
more effective with subjects with unfavorable attitudes. Thus, 
the nature of the modification of the attitude would depend not 
only upon the method of presenting the propaganda, but also upon 
the initial attitude of the subjects. The failure to consider this 
possibility and to incorporate a test of significance of the interac- 
tion in the design of the experiment would be unfortunate. 


10. EXAMPLES 

1. Here is an easy set of measures for practice. Assume that 
the subjects were given a pretest and divided into 10 levels upon 
the basis of the test with 3 subjects within each level. Assume 
also that within each level subjects were assigned at random to 
one of 3 experimental conditions. The data given below are the 
records of performance during the critical test situation, i.e., under 
the experimental conditions. Analyze the total sum of squares 
into 3 parts. 

Levels of Initial Ability 


1 2 9 4 & 6 T S GH x 


Group A 21 20 2 18 18 18 18 16 16 
Group B 20 19 19 19 17 18 16 15 D: 15 
Group C 22 21 22 20 18 19 19 18 am - 
2. Peters (1944) in an extremely enlightening article on " 
nature of the residual sum of squares in experiments of the q, e 
described in this chapter cites a hypothetical methods experi 
ineducation. Assume that 5 methods of teaching have been pee 
in schools from cities with varying populations. Suppo: x 
Se, for 


esign 


< 


EXPERIMENTAL DESIGNS INVOLVING MATCHED GROUPS 281 


example, that the succeeding rows represent cities of increasing 
size. (a) Analyze the total sum of squares into 2 parts, that 
within columns and that between columns. This is the familiar 
two-part analysis of earlier chapters. Test the mean square be- 
tween columns (methods) for significanee. (b) Now, taking 
cognizance of the row classification, compute the sum of squares 
for rows (schools), and using the new residual sum of squares as the 
error term, test for significance of the methods mean square. 
(c) How would you account for the discrepancy between the con- 
clusions that would be drawn from the above tests? 


Mean Scores by Schools and by Methods 


Methods 
Schools Sum 
A B Cc D E 

1 25 27 24 28 22 126 

2 24 32 29 26 24 135 

3 31 35 27 36 26 155 

4 40 45 33 42 30 190 

5 43 50 38 46 33 210 

6 45 48 40 52 36 221 
Sum 208 237 191 230 171 1,037 


3. Assume that 20 subjects were pretested and then paired 
on the basis of initial performance. One member of each pair was 
then assigned at random to an experimental group and the other 
member to a control group. The members of the experimental 
group were then subjected to some experimental condition and 
both groups were retested. The data given below are the per- 
formance scores of the paired subjects on the final test. (a) Using 
the methods of this chapter, analyze the total sum of squares into 
3 parts and find the value of F for the test of significance of the 
mean square between groups. (b) Take the difference between 
the pairs of scores and apply the ¢ test as described in the chapter. 
You should find, within errors of rounding, that ¢? is equal to F. 


Pairs of Subjects 


12 3 4 5 6 7 8 9 10 


Experimental group 25 46 93 45 15 64 47 56 73 66 
Control group 3.6 57 89 67 19 78 46 59 69 7.0 


282  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


4. The set of paired measures given below illustrates one of 
the points made in the chapter with easy computations. (a) Ana- 
lyze the total sum of squares into 2 parts, that within groups and 
that between groups, and find the value of F. (b) Now analyze the 
total sum of squares into 3 parts. Note that the residual mean 
square, which would be used as the error term in this analysis, is 
larger than the mean square within groups when the two-part 
analysis is made. An examination of the scores will show that the 
correlation between the performance of the pairs of subjects is 
negative. Compare this finding with the arrangement of the pairs 
of scores in Example 3. There you will observe that the correla- 
tion is positive. (c) To further clarify the issue, make the easy 
computations which are involved in applying the ¢ test to the data 
without taking cognizance of the matching, i.e., the fact that the 
observations have been paired. (d) Now compute the standard 
error of the difference by the method described in this chapter, 
noting that this method takes into account the correlation between 
the pairs of scores. But since the correlation is negative, it can 
easily be seen from formula (65) that the standard error of the 
difference will be greater than the value obtained in (c). If you 
will study carefully these arithmetic results, the discussion of the 
chapter will be much clearer. 


Pairs of Subjects Group A Group B Sum 
1 13 14 27 

2 12 13 25 

3 16 12 28 

4 15 11 26 

5 14 10 24 

Sum 70 60 130 


5. Assume that subjects have been given an intelligence test 
and divided into groups of 3 with comparable scores. Within each 
level of intelligence subjects have been assigned at random to one 
of 3 methods of instruction. At the end of the school term, the 
subjects are given a standardized achievement test, and their 
scores on this test are recorded below. We note that the subjects 
have been matched across rows in terms of scores on the test of 
intelligence given at the beginning of the term, so that, following 


Aa 


I 


ACT L 


EXPERIMENTAL DESIGNS INVOLVING MATCHED GROUPS 283 


the methods of analysis of this chapter, the total sum of squares 
will be analyzed into 3 parts. Test for the significance of the mean 
«^ square for methods. 


Initial Intelligence Methods 
Levels X B C 
| 1 18 19 15 
2 15 14 13 
3 16 18 17 
4 18 14 15 
5 19 13 13 
6 20 16 11 
7 19 17 15 
8 17 16 14 
9 18 17 13 
10 20 16 14 
11 16 16 19 
12 11 15 18 
13 11 19 19 
14 16 17 21 
15 15 15 18 
16 14 16 21 
17 13 16 20 
18 16 15 17 
19 14 14 18 
20 14 17 19 
21 17 17 15 
22 13 15 19 
e 23 16 18 18 
: 24 12 15 17 
| 25 13 15 17 
26 15 18 18 
| 27 13 16 16 
j 28 13 16 18 
] 29 14 16 15 
30 14 14 17 


CHAPTER 15 


Experimental Designs Involving Repeated 
Measurements of the Same Subjects 


1. A PROBLEM IN EXPERIMENTAL DESIGN 


Suppose that an experiment has been carried out in the 
following manner. A group of n subjects is tested with a stylus 
maze and then at a later period the same group of subjects is tested 
on the same maze, but under a condition where they have been 
given a dose of benzedrine. The mean performance of the subjects 
is obtained on the first trial, and the mean for the second trial is 
also obtained. The difference between the means is to be tested 
for significance. 

This experimental design is analogous to the one discussed 
in the previous chapter in which observations were made upon 
matched pairs of subjects. The standard error of the difference 
between the means which is necessary for the test of significance 
would be that given by formula (64) or formula (65). With n; 
observations in the first trial and ne observations in the second 
with n; = ng = n, and with the observations paired, the degrees 
of freedom available for evaluating t would be equal to n — 1, 
If 20 subjects are tested, then the degrees of freedom would be 
equal to 19. Let us suppose that the mean on the second trial is 
larger than the mean on the first trial and that the ¢ obtained is 
significant with P less than .01. 

The results of this experiment are not unambiguous. The 
difference between the means is significant, but when we examine 
the structure of the experiment to attempt to account for the 
difference, we are faced with several possibilities. The difference 
may be the result of practice effects or learning from the first to 
the second trial. The difference may be due to the influence of the 
benzedrine. The difference may reflect both the influence of 

284 


4 


4 


M 


REPEATED MEASUREMENTS OF THE SAME SUBJECTS 285 


benzedrine and learning. The structure of the experiment makes 
it impossible to give a clear-cut interpretation of the results 
obtained. 

Now the difficulty here, if the main interest is in the influence 
of the benzedrine, may be overcome by the use of a control group. 
A total of N subjects may be divided at random into 2 groups with 
n subjects in each. One of the groups would then serve as a control 
and the other as an experimental group. The control group 
would be tested twice under the same conditions, ie., without 
benzedrine. The experimental group would be given the first 
trial without benzedrine and the second trial with benzedrine.' 
Then any shift or change in performance of the control group could 
not be attributed to benzedrine and might be assumed to be due 
to practice or learning. 

The mean change or gain in performance for the control 
group would be given by the difference between the means for the 
first and second trials. If we let the first subscript refer to the 
group and the second to the trial, then this mean gain may be 
symbolized by X; = Xi» — Xu. The standard error of Xj, 
since we have a pair of measures for each subject in the control 
group, may be obtained by means of formula (64). Similarly, 
the mean gain or change in performance for the experimental 
group may be symbolized by Xo = Xoo — Xni, and the standard 
error of X4 may also be obtained from formula (64). Then the 
standard error of the difference between the mean gains Xo — X, 
may be found by substituting sz? and sz, in formula (43) which 
will give sz—z, = Vsa + sa. The number of degrees of 
freedom available for the test of significance will be equal to 
nı — 1 plus ng — 1. If the ¢ obtained by dividing the difference 
X, — X, by the standard error of the difference between the 
mean gains is significant, then this would indicate that the 
benzedrine contributes something to the performance in addition 
to the effects of practice.” 


1'This design would still not differentiate between the possible attitudi- 
nal reactions of the subjects in the experimental group to the drug and the 
physiological influence of the drug itself. The design might be increased in 
scope with another group which would be given a placebo on the second trial. 

2 The data could also be analyzed by covariance techniques. The 
analysis of covariance is described in Chap. 17. 


286 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


2. TESTING THE SIGNIFICANCE OF PRACTICE EFFECTS FOR A 
SINGLE GROUP GIVEN 2 TRIALS 


In the experiment described, the problem was to rule out the 
effects of practice or learning. In many experiments, however, the 
practice effects themselves are of major interest, and the design 
which is most often used in such problems involves repeated 
measurements on the same group of subjects. The experimenter 
is interested in knowing, for example, whether there is any 
significant improvement from trial to trial. 

Let us take a simple case where we have r trials with r equal 
to 2. Subjects, for example, might be given a 10-min. trial under 
a given set of conditions, then a 5-min. rest, and then a second 
trial of 10 min. under the same conditions as the first trial. Is 
there any significant difference or change in performance between 
the 2 trials? In the case of but 2 trials, the test of significance is 
most easily made by taking the differences between the pairs of 
observations available for each subject, preserving the algebraic 
signs of the differences, and then applying formula (64) to the 
distribution of difference scores. The difference between the 
means divided by the standard error of the difference obtained by 
formula (64) will give the ¢ ratio with degrees of freedom equal to 
the number of differences minus 1. 

An analysis of variance applied to the same experimental 
design would result in a breakdown of the total sum of squares 
based upon the n; plus ns observations into 3 component parts. 
Since we have 2 measurements available for each subject, it is 
possible to obtain a sum and a mean for each subject. We could 
then obtain a sum of squares based upon the variation of the 
means of the subjects about the combined mean of all N observa- 
tions. The number of degrees of freedom for this sum of squares 
would be 1 less than the number of means or sums, or, in other 
words, 1 less than the number of subjects. 'This sum of squares 
would correspond to the row sum of squares of the analysis of 
variance with matched groups and may be called the sum of squares 
between subjects. 

A second sum of squares based upon the 2 trial means would 
be obtained, and this would correspond to the sum of squares 


pm 


.REPEATED MEASUREMENTS OF THE SAME SUBJECTS 287 


between groups or between experimental conditions in the analysis 
of variance applied to matched groups. In the case of but 2 
groups, as in the problem under discussion, this sum of squares 
would be based upon 1 degree of freedom. 

The third sum of squares to be obtained would be the residual 
or interaction sum of squares. This sum of squares will correspond 
to the interaction between the rows and columns or, in the present 
instance, to the interaction between subjects and trials. As 
usual, this sum of squares may be obtained most easily by sub- 
traction, using any of the methods indicated below: 


R x C interaction = within trials (columns) — between 
subjects (rows) (66) 


R x C interaction = within subjects (rows) — between 
trials (columns) (67) 


R x C interaction = total — between subjects (rows) — 
between trials (columns) (68) 


With 20 subjects tested twice, we would have a total of 40 
observations with 39 degrees of freedom. The sum of squares 
between subjects would account for 19 degrees of freedom; the 
sum of squares between trials would account for 1 degree of 
freedom; and the interaction sum of squares would account for 19 
degrees of freedom. 

The test of significance of the mean square between trials 
would be given by the F obtained when this mean square is divided 
by the interaction mean square. The F thus obtained will be 
equal, within errors of rounding, to the value of ? obtained when 
formula (64) is used for the standard error of the difference. 


3. THE SIGNIFICANCE OF PRACTICE EFFECTS FOR A SERIES OF 
TRIALS 


Now a logical extension of this experimental design can be 
made for the case where we have r trials and r is greater than 2. 
The same subjects, for example, may be given 5 trials with rest 
periods between the trials. If we had 20 subjects, we would have 
5 observations on each subject giving a total of (5)(20) = 100 


299 EXPERIMEN z s D 
ENTAL DESIGN IN PSYCHOLOGIC AL RESEAR 
E CH 


observations. The total sum of squares b 

" Lr dens ems from the ouem ef a arto 
cie "n wm sm into the 3 component de doc 
mias poss i e ased upon the variation of the aie 
e combined mean, giving r — 1 "aedi e 
wy a A second sum of Squares would be given x E degrees of 
means of the subjects, each mean bei die aration 
observations, about the combined mean. This ea ewe 
between subjects would have n — 1 degrees of ü eda, Squares 
present case, 20 — 1 — 19 degrees of freedom M or, in the 

degrees of freedom would be associated with he oh si 
for the interaction between subjects and trials and mur e 
€ given 


by (r = 1)(n = 1) = (4)(19) = 76. The interaction mean g 
would be used to test the significance of the mean square a a 
ween 


trials. 


aining 


4. THE ANALYSIS OF REPEATED MEASUREMENTS ON SEV 
INDEPENDENT GROUPS ERAL 


The analysis of variance applied to experiments involvi 
repeated measurements becomes somewhat more anp ving 
when we have several diferent groups of subjects involved cater 
repeated measurements upon the subjects within each of the as hie 
groups. This is a common type of problem in so-called best 
experiments where several groups are given a series of trials pes 

] 


each group performs under a different method, set of instructi 
ions, 


or experimental condition.? 

Let us suppose that we have 3 different methods A, B, and 
For simplicity let us assume that 15 subjects are available C. 
that they have been divided at random into 3 groups of 5 ae 
each, one group to be tested under each method. Each ae 
tested 3 times on successive days, the daily trials being Fe d S 
by I, II, and III. Since we have a total of 15 subjects chem 


peo URNA OR MIN MN sq Mug A 
3 See the article by Humphreys (1943) on conditioni 
by Gilliland and Humphreys (1943) on time Va a agg on i rs article 
is aware, these were the first psychological studies using the insthb di e writer 
to be described. of analysis 
Later articles by Alexander (1946), Lindquist (1947), and Kogan (194: 
" 8) 


deal with the same problem. 


REPEATED MEASUREMENTS OF THE SAME SUBJECTS 289 


observations on each subject, we will have a total of 45 measure- 
ments. Assume that the results of the experiment are as given in 


'Table 61. 


TABLE 61. Performance Scores for 8 Groups of Subjects with Each Group Tested 
under a Different Method and with 3 Trials for Each Group 


iS 


Trials 
Methods Subjects —— Sum 
I II HI 

1 2 4 7 13 

2 2 6 10 18 

A 3 3 7 10 20 
4 7 9 11 27 

5 6 9 12 27 

1 5 6 10 21 

2 4 5 10 19 

B 3 7 8 11 26 
4 8 9 11 28 

5 11 12 13 36 

1 3 4 7 14 

2 3 6 9 18 

c 3 4 7 9 20 
4 8 8 10 26 

5 $030 o 3 

Sum 80 110 150 340 


5. CALCULATION OF THE SUMS OF SQUARES 


There are several ways in which the analysis of the data of 
Table 61 might be undertaken. We shall proceed here with one 
method which is convenient, and then in a later section we shall 
discuss some alternatives. 

The total sum of squares, based upon 44 degrees of freedom, 
is computed in the usual manner by squaring the individual entries 
in the table, summing, and subtracting the correction term for 
origin, Thus, 

40)2 
Total = (2)? + GY + @)-+---+ coy? — SOY 


qum 369.111 


290  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Now let us take the sums for the individual subjects at the 
right of Table 61. We have 15 sums giving 14 degrees of freedom. 
The sum of squares calculated from these sums will be the sum of we 
squares between subjects and is obtained by 


3) , (18)? , (20)? , ... 4 CD? _ (B40) 
3° es 3 3 45 


= 175.778 


The sum of squares just calculated may be further analyzed 
into 2 components. We can obtain a sum of squares based upon 
the variation of the 3 means A, B, and C about the combined mean. 
Keeping in mind that the sums for A, B, and C are each based 
upon 15 observations, we have 


zy 2 zy 2 
(105) & (130) JE (105) (940) 
15 15 15 45 
It is significant to note at this point that the sum of Squares for 
the methods groups A, B, and C, which is equal to 27.778, is based 
upon the performance of randomly assigned, independent subjects 
and hence does not involve correlation between the means of the 

methods groups. 
À second sum of squares may be obtained from the variation 
of the individual subject means about the means of the methods 


groups to which they belong. Thus, for the first methods group 
A, we have 


13) , (18)? , (20)? 
3 T 3 » 3 
For the second methods group B, we have 


(1) , (19)? | (26)? | (28)? | (36)? (130) 
Ca Ux ae T^y 7x 15 


And for the third methods group C, we have 
a 18)? 20)? 26)? 27)? 105)? 
(14) B (18) J: (20) ( + Qr ¢ 


3 3 3 3 3 15 


= 27.778 


Ad 72 2 
m q EOE _ der | 


* 3 15 eX 


= 59.333 


= 40.000 


Each of the three sums of squares calculated above will be 
based upon 4 degrees of freedom. The sum of these sums of 


REPEATED MEASUREMENTS OF THE SAME SUBJECTS 291 


squares, 48.667 + 59.333 + 40.000 = 148.000, will be based upon 
12 degrees of freedom. It may be observed that the sum of the 
2 sums of squares, 27.778 4- 148.000, equals the total sum of 
squares between subjects 175.778. 

We have now accounted for 14 of the total of 44 degrees of 
freedom. What about the remaining 30 degrees of freedom? 
With what sum of squares will they be associated? If we sub- 
tract the total sum of squares between subjects 175.778 from the 
total sum of squares 369.111, we have, as in the usual two-part 
analysis of variance, a sum of squares within rows equal to 
369.111 — 175.778 — 193.333. Since the rows correspond to sub- 
jects, this sum of squares 193.333 might be called the within- 
individuals (intrasubjeet variation) sum of squares in contrast 
with the sum of squares 175.778, based upon the variation between 
individuals (intersubject variation). 

Let us now see how the within-subjects sum of squares 
193.333 with 30 degrees of freedom may be further analyzed. We 
can obviously compute a sum of squares based upon the 3 trials 
I, II, and III. This sum of squares will be given by 


(80)? , (110)? , (150)? _ (340)? 
15 n 15 + 15 45 


= 164.444 


and will account for 2 of the 30 degrees of freedom within subjects. 
If we now take the sum of squares for trials 164.444 and sub- 
tract it from the sum of squares within subjects 193.333, we obtain 
a remainder equal to 28.889, based upon 28 degrees of freedom. 
This sum of squares is the interaction sum of squares for subjects and 
trials. It may also be analyzed into 2 components. One portion 
will be based upon the interaction between trials and methods and 
may be obtained in the usual way from Table 62. Each sum in 
the cells of Table 62 is based upon 5 observations, and the necessary 
caleulations for the interaction between trials and methods are 
as follows: 
2 EN2 EY 532 2 
Between cells — 0) =e = sk = a ii . £9 
5 5 5 5 45 


= 201.111 


202  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


TABLE 62. Sums of Performance Scores for Trials and Methods 


Trials 
Methods SSS Sum 
I II III 
A 20 35 50 105 
B 35 40 55 130 
[9] 25 .35 45 105 
Sum 80 110 150 340 


and we have already calculated the sum of squares 


] (80) (110)? (150)? (340)? 
Between trials = "15 + EUN + E — 
and we have also calculated previously the sum of squares 
05)? , (130)? , (105)? — (340) 
15 15 15 45 

and thus by subtraction, we obtain the sum of squares for 
Interaction: trials X methods = 201.111 — 164.444 — 27.778 


= 8.889 


The degrees of freedom for the interaction sum of squares for 
trials and methods will be given by the product of the degrees of 
freedom of the interacting variables, and we thus have (2)(2) = 4 
degrees of freedom for this sum of squares. 

Before leaving the sum of squares for the interaction of trials 
and methods, we may examine it somewhat more carefully to note 
that it is based upon the presence of the same subjects in the com- 
parison. This is not easily observed in terms of the method of 
obtaining the interaction sum of squares used above, namely, 
subtraction. But it so happens that in a 3 X3 table, such as 
Table 62, the interaction sum of squares may be calculated directly 
from the diagonals of the table. This method of obtaining the 
interaction sum of squares has the virtue of providing a check upon 
the accuracy of previous calculations. It also has the virtue, in 
the case at hand, of illustrating quite definitely that the interaction 
is based upon the presence of the same subjects in the sums com- 


= 164.444 


Between methods = = 27.778 


REPEATED MEASUREMENTS OF THE SAME SUBJECTS 293 


pared and should make clearer the logic of the complete analysis of 
variance and the tests of significance which we will wish to make. 

Let us take first the sums of the upper left to lower right 
diagonals of Table 62. We shall have for one sum IA + IIB + 
IIIC. Another will be given by IIA + IIIB + IC. The third 
will be given by IB + IIC + IIIA. Thus, we have 


(20 + 40 + 45)? 4 Git 55+ 25)? 


15 15 
4 (35 + Br 50)" — LB = 7.778 
15 45 


Now let us take the sums of the upper right to lower left 
diagonals of Table 62. We shall have for one of the sums 
IIIA + IIB + IC. Another will be given by IIA + IB + IIIC. 
The third will be IIIB + IIC + IA. Thus, we have 


(50 + 40 + 25)? + (35 + 35 + 45)? 


15 15 
(55 + 35 + 20)? (340? | 
+ T x 1.111 


We may note that the sum of these 2 sums of Squares, 
7.778 -- 1.111 — 8.889, is the value of the sum of squares for 
the interaction between trials and methods. More important for 
our purpose, however, is that we may see that the sums upon which 
this interaction sum of squares is based do involve the presence 
of the same subjects. In this respect, the interaction sum of 
squares is similar to that for trials I, II, and III, rather than that 
for methods A, B, and C, for the methods sum of squares was based 
upon independent, randomly assigned subjects, whereas the sum 
of squares for trials also involved the presence of the same subjects. 

The interaction sum of squares for trials and methods accounts 
for one portion of the interaction sum of squares for subjects and 
trials. We still have a remainder which will be equal to 28.889 
— 8.889 — 20.000 and which will be based upon 24 degrees of 
freedom. What is the nature of this remainder? In brief, it is 
the pooled interactions for subjects and trials for each methods group 


204  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


considered separately. For example, the analysis of variance of 
the scores for the method A group alone would take the form: 


Source of Variation df 
Subjects 4 
Trials 2 
Interaction: subjects X trials 8 

Total 14 


Calculation of the subjects x trials interaction for the A 
group results in a sum of squares equal to 5.33. The similar 
interactions for the B and C groups are equal to 6.67 and 8.00, 
respectively. Each of these interaction sums of squares is based 
upon 8 degrees of freedom. Pooling the sums of squares and the 
degrees of freedom, we obtain the pooled interaction sum of squares 
20.000, based upon 24 degrees of freedom.* 


6. THE TESTS OF SIGNIFICANCE 


The various sums of squares into which we have divided the 
total sum of squares have been entered in the upper part of Table 
63, along with their associated degrees of freedom. The mean 
squares which are shown result from the division of the sums of 
squares by the degrees of freedom. 

In terms of earlier analyses, the mean square based upon the 
variation between subjects in the same group may be used to test 
the significance of the mean square between methods groups. The 
value of F is given by 13.889/12.333 = 1.126, and this value does 
not meet the usual requirements of significance for 2 and 12 degrees 
of freedom. So we may infer that the 3 methods are equally 
effective, since the differences between them can be accounted for 
in terms of random sampling from a common population. 

In testing the significance of trials and the interaction between 
trials and methods, we shall use as an error term the mean square 
based upon the pooled interaction sums of squares for subjects x 


4 We have pooled the 3 interaction sums of squares and degrees of 
freedom under the assumption that the corresponding mean squares are 
homogeneous. This assumption may be tested by means of xl The test 
is described on pp. 195-197. 


uv 


REPEATED MEASUREMENTS OF THE SAME SUBJECTS 295 


TABLE 63. Analysis of Variance of Performance Scores of 8 Groups of 
Subjects Tested under Different Methods with 3 Trials for Each Group 


aes 


Source of Variation Sum of Squares df Mest 
Square 
Between methods: A, 
B,C 27.778 2 13.889 1.126 
Between subjects in same 
group 148.000 12 12.333 
"Total between subjects 175.778 14 
Between trials: I, IT, III 164.444 2 82.222 98.706 
Interaction: trials X 
methods 8.889 4 2.222 2,667 
Interaction: pooled sub- 
jects X trials 20.000 24 .833 
Total within subjects 193.388 30 
Total 369.111 44 
Source of Variation Sum of Squares df 
Between methods: A, B, 
C 27.778 2 
Between trials: I, II, III 164.444 2 
Interaction: trials X 
methods 8.889 4 
‘Total between cells 201.111 8 
Between subjects in same 
group 148.000 12 
Interaction: pooled sub- 
jects X trials 20.000 24 
Total within cells 168.000 36 
Total 369.111 44 


Se ee 


trials, The F for trials is given by 82.222/.833 = 98.706, and for 
2 and 24 degrees of freedom this is a highly significant value. It is 
apparent from an examination of the data that the trial means do 
differ, and the test of significance indicates that this variation is 
greater than can reasonably be attributed to random sampling 
from a common population. The F for the trials X methods 
interaction will be equal to 2.222/.833 or 2.667. This value falls 


296 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


somewhat short of the value of 2.78 required for significance for 
4 and 24 degrees of freedom. 

It is important to note that the test of significance for 
methods A, B, and C is based upon independent, randomly 
assigned subjects, whereas the test of significance for trials and the 
interaction between trials and methods is based upon the same 
subjects and hence the presence of correlation may be involved in 
the latter comparisons. This is the reason why we have two 
separate error terms for the tests of significance: one for the 
possibly correlated data obtained from the same subjects and one 
for the data obtained from the independent groups of subjects. 


7. RELATING THE ANALYSIS TO PREVIOUS METHODS 


Now let us see if we can relate this analysis to the more 
traditional approach outlined in earlier chapters. This approach 
is shown in the lower part of Table 63. The total sum of Squares 
as found previously is equal to 369.111 with 44 degrees of freedom. 
A sum of squares between the cells of Table 62 may be obtained 
in the usual manner, and this will be found equal to 201.111 and 
will have 8 degrees of freedom. Thus, the remainder within cells, 
based upon the variation of the 5 observations present in each 
cell about the cell mean, will yield a sum of squares which may be 
found directly or by subtraction of the between-cells sum of Squares 
from the total sum of squares. This will be found equal to 
168.000, and since each cell will contribute 4 degrees of freedom 
and we have 9 cells, we will have a total of (4)(9) — 36 degrees of 
freedom for the sum of squares within cells. 

Now if we had 5 independent, randomly assigned subjects 
in each of the cells, this sum of squares within cells, divided by the 
36 degrees of freedom, would provide an error term for testing 
the significance of all of the comparisons in which we are interested. 
However, this sum of squares within cells is a conglomeration 
based upon correlated and independent observations. It will, 
therefore, require further analysis. 

The same situation is true of the sum of squares between cells, 
201.111, based upon 8 degrees of freedom. This sum of Squares 


"e 
Ww 
iu 


REPEATED MEASUREMENTS OF THE SAME SUBJECTS 297 


may be further analyzed into a sum of squares between methods 
A, B, and C with 2 degrees of freedom and which we have already 
found to be equal to 27.778. Another component is the sum of 
squares for trials I, IT, and III with 2 degrees of freedom, and this 
we have found to be equal to 164.444. The third component is the 
interaetion between trials and methods with 4 degrees of freedom 
and which we have found to be equal to 8.889. Note that the 
sum of these sums of squares is equal to 201.111, the sum of squares 
between the 8 cells, and that what we have accomplished here is 
to isolate those sums of squares involving correlated observations, 
namely, the trials sum of squares and the interaction sum of 
squares for trials and methods, from the one sum of squares based 
upon independent observations, the methods sum of squares. 

Now, if we take the sum of squares within cells 168.000 with 
its associated 36 degrees of freedom, let us see how we can separate 
this in the same manner. It will be clear from an examination 
of Table 63 that this sum of squares has been analyzed into 2 
components, namely, the 2 error terms of the table. One com- 
ponent is the variation between subjects in the same methods 
group 148.000, which is based upon 12 degrees of freedom, and the 
other component is the pooled sum of squares for the interaction 
between subjects and trials in the separate methods groups 20.000, 
which is based upon 24 degrees of freedom. The first of these error 
terms is based upon the variation of independent observations. 
'The second of the error terms takes into account the intercorrela- 
tions of the columns. It is this error term which is used to test 
the significance of the mean squares which involve the same sub- 
jects and hence possibly correlation. 


8. EXAMPLES 


1. Fifteen subjects are divided at random into 3 groups of 
5 subjects each. One group is presented learning material by 
method A, another by method B, and the third by method C. 
Retention for the material presented is tested at 3 different times: 
immediately after the learning period, 24 hr. after the learning 
period, and 1 week after the learning period. The series of reten- 
tion scores for the subjects in the various groups are given below. 


298 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Time of Testing 


Subjects 
Immediate 24 Hr. One Week 

it 28 25 22 

2 32 29 24 

Method A 3 36 35 27 
4 45 42 40 

5 46 43 40 

6 27 24 20 

7 29 26 22 

Method B 8 36 36 30 
9 42 43 39 

10 48 44 40 

11 40 38 33 

12 36 26 20 

Method C 13 50 48 44 
14 45 43 35 

15 42 39 30 


(a) Analyze the total sum of squares into its component 
parts and test the various mean squares for significance. (b) 
Compute directly the sum of squares within each of the 9 cells 
above. Into what component parts has this been analyzed? 

2. Twenty-one subjects are divided at random into 3 groups 
of 7 subjects each. A complicated stylus maze has been con- 
structed on alarge board. On the face of the board are small brass 
disks arranged in columns and rows with a small space separating 
each disk. The back of the board is wired so that if particular 
disks are touched with the stylus an electrical circuit operates. 
The subjects are instructed to take the stylus and to start at the 
upper left corner of the board and to move from disk to disk, one 
at a time, to the lower right corner. There is only one path that 
can be taken without operating the circuit. One group of subjects 
is told that ability to learn the maze is closely related to intelligence. 
Another group is told that the average college student makes only 
20 errors on the fifth trial. The third group is told that they are 


MT. 


MI 


> 


REPEATED MEASUREMENTS OF THE SAME SUBJECTS 299 


simply to make as few errors as possible. Each group of subjects 
is given 5 trials. The errors are given below: 


Trials 
Subjects 

1 2 3 5 
1 40 39 33 33 20 
2 40 33 31 23 22 
“Intelligence” 3 38 34 30 28 26 
Group 4 31 29 26 2 20 
5 38 37 36 32 26 
6 39 33 29 28 26 
7 38 32 28 25 21 
8 28 28 24 21 20 
9 39 25 23 23 17 
“Average 20” 10 32 32 28 31 26 
Group 11 34 27 26 25 23 
12 35 34 27 23 18 
13 35 27 23 21 21 
14 32 24 24 21 22 
15 40 40 30 30 29 
16 35 31 25 22 22 
“Few Errors" 17 39 38 36 36 23 
Group 18 36 24 21 23 21 
19 38 37 33 37 32 
20 39 38 35 34 34 
21 31 30 27 26 24 


(a) Test the mean squares for significanee. (b) Consider 
only the performance of the subjects on the first trial. Taking 
only these 15 measures, find the total sum of squares, the sum of 
squares within groups, and the between-groups sum of squares. 
Test the mean square between groups for significance. Interpret 
your results. (c) Repeat the analysis of (b), considering only 
the fifth trial. 

3. Edwards (1941a) studied the influence of attitude upon 
the retention of material which was in conflict with and which was 
in harmony with the attitude. Upon the basis of an initial 


300  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


measure of attitude, subjects were divided into 3 groups: one group 
was “favorable,” one was "neutral," and one was “unfavorable” 
with respect to the issue in question. All 3 groups heard a speech 
which was carefully constructed to contain an equal number of 
favorable and unfavorable statements about the issue. Immedi- 
ately after the speech, the subjects were tested by means of 2 
multiple-choice tests upon the material contained in the speech. 
One of the tests covered material favorable to the issue and was 
called the “Pro Test." The other covered the material which was 
unfavorable to the issue and was called the “Anti Test." The 
subjects were also tested 2 weeks later without prior warning. In 
the original experiment, 48 subjects were tested in each of the 3 
groups. In the interests of simplicity, only the data for the first 
15 subjects in each group are reported here. A complete descrip- 
tion of the experiment may be found in the articles by Edwards 
(1941a, 19410). A somewhat different analysis was applied to the 
original data, but a reanalysis by the methods of this chapter 
indieated that no conclusions concerning significance would be 
changed. Since we have 15 subjects in each group, 3 groups, and 
4 measurements on each subject, we shall have a total of 179 
degrees of freedom. 


Immediate Test Delayed Test 
Attitude Subjects ————————————  —————————— Sum 
Pro Test Anti Test Pro Test Anti Test 
L 20 9 14 10 53 
2 18 3 16 6 43 
3 19 14 14 13 60 
4 17 8 18 2 45 
5 10 6 17 7 40 
6 12 8 6 12 38 
Favorable 7 17 7 6 15 45 
Group 8 10 17 10 7 44 
9 15 12 7 14 48 
10 13 17 19 5 54 
11 22 6 13 6 47 
12 15 14 12 ci 48 
13 17 9 13 8 47 
14 17 7 20 1 45 
15 18 10 16 5 49 
Sum 240 147 201 118 706 


REPEATED MEASUREMENTS OF THE SAME SUBJECTS 301 


16 14 12 8 9 43 

17 12 9 7 22 50 

p^ 18 5 11 14 9 39 
19 12 6 16 3 37 

20 8 8 12 14 42 

21 10 13 5 11 39 

Neutral 22 18 10 6 18 52 
Group 23 13 8 14 15 50 
24 9 11 17 8 45 

25 8 4 12 9 33 

26 6 8 12 8 34 

27 14 11 15 12 52 

28 12 15 9 13 49 

29 16 4 5 5 30 

30 17 11 _8 12 48 

Sum 174 141 160 168 643 

31 8 7 6 5 26 

32 13 8 7 7 35 

33 15 6 12 8 41 

34 13 9 15 5 42 

35 10 20 9 16 55 

Unfavorable 36 10 16 8 8 42 
Group 37 12 15 12 8 47 

38 6 17 4 19 46 

39 9 13 6 12 40 

40 10 14 10 14 48 

41 11 12 9 1 43 

42 10 15 9 15 49 

Lx 43 8 17 4 17 46 
i 44 9 16 8 19 52 
4 14 18 12 — B 9 

Sum 158 203 131 182 674 


Analyze the total sum of squares into its component parts 
and test the various mean squares for significance. 

4. Fertig (1936) took blood cholesterol readings (mg./100 cc.) 
on 18 individuals in April and then again in May. Is the rise from 
April to May statistically signifieant? The data are as given on the 


following page. 


302 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Individuals 


April 


158.0 
158.5 
137.5 
145.5 
130.5 
141.0 
150.5 
142.5 
148.0 
137.5 
137.0 
160.0 
145.0 
149.5 
145.0 
132.5 
139.0 
151.0 


2,608.5 


May 


190.5 
177.0 
172.0 
152.5 
147.0 
127.0 
149.5 
152.5 
147.0 
130.5 
133.0 
145.5 
124.5 
156.0 
143.5 
146.0 
148.0 
161.0 


2,703.0 


5. Brozek and Alexander (1947) cite the fictitious data given 
below in a discussion of the analysis of variance. Test the 
significance of the "day" mean square. 


Days 
Individuals 

1 2 3 4 

1 30 32 28 30 

2 19 32 35 30 

3 20 38 23 23 

4 33 36 29 38 

5 23B Z w 23 
Sum 130 165 155 150 


A 
Sa 


p^ 


a 


CHAPTER 16 


The Latin Square Design in 
Psychological Research 


1. AN EXPERIMENT IN COLOR RECOGNITION 


An unpublished study by Vinacke (1943) was concerned 
with an investigation of the accuracy of the recognition of various 
colors presented to the dark-adapted eye under varying levels of 
illumination.! The colors investigated were yellow, blue, green, 
and red. The intensity of illumination was varied in 4 ways 
from a very weak intensity through 3 increasing intensities. No 
data are given for the experiment itself, but the experimental 
design is one which we wish to discuss. It is one which is ad- 
mirably suited to certain types of problems, though there is some 
question as to whether it would be the most valuable type of 
design to apply to the problem in which Vinacke was interested. 

The design which Vinacke used is called a Latin square 
design. ‘The Latin square is of particular value when the number 
of experimental conditions ranges from 4 to perhaps 9. For a 
larger number of experimental conditions than this, the design 
becomes somewhat cumbersome in that in its usual application in 
psychological research, each subject will be tested under all ex- 
perimental conditions. For fewer than 4 experimental conditions, 
the number of degrees of freedom associated with the sum of 
squares used in arriving at an error term becomes too small.” 

A Latin square design requires that we have as many replica- 
tions as we have experimental conditions. In the experiment 


1 Reported in Garrett and Zubin (1943). 

2 Unless the complete experiment is replicated with a series of Latin 
squares, Additional complications, however, arise with the 2 x 2 Latin 
square. Grant (1948) discusses these and also provides an excellent general 
treatment of the Latin square design in psychological research, 


303 


304 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


mentioned, the design is schematically represented by Table 64. 
Note that each color occurs once and only once in each row and 
each column. This is a restriction arbitrarily imposed by the 
experimenter. The colors have been assigned at random, how- 
ever, within the limits of this restriction, to the cells of Table 64. 


TABLE 64. Latin Square Design of a Color Recognition Experiment with 
Intensity of Illumination Varied in 4 Ways and with 4 Orders of 
Presentation of the 4 Colors 


Order of Illumination Levels 
Presentation 1 2 3 4 
a R B ¥ G 
b G R B Y 
c B Y G R 
d Y G R B 


It would obviously be possible to calculate a sum of squares 
for the 16 cells of Table 64 by squaring each of the entries and 
subtracting the sum squared of all the entries, divided by 16. 
This would be the familiar total sum of squares and would have 
15 degrees of freedom. It would also be possible to calculate 
a sum of squares based upon the column totals in the manner 
which is already familiar, and this sum of squares would be as- 
sociated with differences in the illumination levels and would be 
based upon 3 degrees of freedom. Another sum of squares could 
be calculated for the row totals, and this sum of squares would be 
associated with differences between orders of presentation of the 
colors and/or whatever the rows of the table happen to correspond 
with. Still another sum of squares could be calculated by taking 
the sums of the cell entries for each of the various colors; that is 
to say, symbolically speaking, taking > R, £ B, È Y, and X G. 
And since we have 4 colors, we would have 3 degrees of freedom 
associated with the differences between the various colors. 

The calculations above account for 9 of the 15 degrees of 
freedom, and by subtraction we would have a remainder (residual) 
of 6 degrees of freedom. Associated with these 6 degrees of free- 
dom is a remainder or residual sum of squares, which also may be 


à 
| 

i 
qa 
E 
T 


X^ 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 305 


obtained by subtraction. We take the total sum of squares and 
from this subtract the sum of squares for rows, columns, and 
colors, and this will give us the residual. The completed analysis 
would be as given below: 


Source of Variation Sum of Squares df Mean Square F 
Rows 
Columns 
Colors 
Residual 

"Total 


Q0 0 9 0 


m 


The residual mean square, based upon 6 degrees of freedom, 
may be used for testing the significance of the mean squares for 
rows, columns, and colors. If the experimenter's main interest 
is in the differences in the accuracy of recognition of the colors, and 
if the order of presentation and illumination level are major 
sources of variation, then the Latin square design used here to 
investigate this problem may be considered adequate. For if 
order of presentation, or whatever it is that is measured by the 
row sum of squares, and level of illumination contribute to the 
total sum of squares, these sources of variation are controlled, in 
the sense that the sums of squares attributable to them ean be 
calculated and isolated from the sum of squares used to derive 
the error term. 

On the other hand, this design permits no test of the signifi- 
cance of the possible interaction between illumination levels and 
the color variable. On the basis of well-established research, this 
would seem to be of some importance, and a factorial design would 
permit this analysis and a test of significance of the interaction. 

It is not clear from the account given by Garrett and Zubin, 
butit would seem in the experiment described that a single subject 
appears in each row so that the order of presentation differences 
may also be described as differences between subjects. If the 
same subject’s records do appear in the rows, then this design 
will not permit the isolation of the sum of squares attributable to 
over-all differences between subjects and the order of presentation 
of the colors. For example, if the row mean square should prove 
to be significant, we would not know whether this was due pri- 


306 


marily to differences between the subjects or to differences be- 
tween the orders of presentation. 


2. THE BLISS AND ROSE EXPERIMENT 


In the application of the Latin square design to the problem 
now to be considered, we have some actual data to work with, and 
some of the details touched upon with respect to Vinacke's problem 
may be seen more clearly. Bliss and Rose (1940) also used à 
4 x 4 Latin square design. The experimental condition consisted 
of a dosage of an extract of parathyroid, and 4 preparations were 
used. These are labeled as Ui, Us, Sı, and S», but the exact 
nature of these dosages is unimportant for our purpose. The 
dosages were given to each of 4 dogs at 4 different times. In the 
original experiment by Bliss and Rose, the Latin square design 
was used 5 different times, giving a total of 80 measurements, but 
we shall consider, for the present, but one of the designs. 


EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


TABLE 65. Latin Square Design for an Experiment Involving 4 Dosages 
of a Drug with 4 Different Test Days and 4 Different Animals 


Days 
Dogs ——— 
1 2 3 4 
1 Si S2 U; U1 
2 U: Ui Si S» 
3 S3 Si Ui U2 
4 Ui Uz Se Si 


The design is shown in Table 65 and the outcomes or measure- 
ments, which were in terms of the mg.-per cent serum calcium, are 
Table 66. The columns correspond to the periods or 
were administered, the rows to the 
entries correspond to the treatments 


given in 
days on which the dosages 
individual dogs, and the cell 
or dosages administered. 


3. A METHOD FOR ASSIGNING TREATMENTS IN THE LATIN 
SQUARE 


Before analyzing the data of Table 66, it may be worth- 
while to examine the method by which the treatments or experi- 


É. 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 307 


TABLE 66. Outcomes of the Latin Square Design for an Experiment Involving 
4 Dosages of a Drug with 4 Different Test Days and 4 Different Dogs 


—A———————————————————————————————— 


Days 
Dogs 1 2 3 i Sum Mean 
1 13.8 17.0 16.0 16.0 62.8 15.7000 
2 15.8 14.3 14.8 15.4 60.3 15.0750 
8 15.0 145 14.0 15.0 58.5 14.6250 
4 14.7 15.4 14.8 14.0 58.9 14.7250 
Sum 59.3 61.2 59.6 60.4 240.5 
Mean 14.8250 15.3000 14.9000 15.1000 15.03125 


——————————————UOow0Á OCoOlOWaáaOOePe(h€cwhccd 


mental conditions are assigned at random to the cells of the table, 
subject to the restriction that the dosage or treatment may occur 
once and only once in each column and each row. 

Let us consider a 5 X 5 Latin square. This will involve 5 
experimental conditions or treatments, and these may be numbered 
0, 1, 2, 3, and 4. Then entering a table of random numbers, we 
may take blocks of two (two rows or two columns) and divide 
each pair of random numbers by the number of experimental 
conditions, which in the present case would mean division by 5. 
Then the remainders will be 0, 1, 2, 3, or 4. For example, 00 
divided by 5 gives a remainder of zero, 01 will give a remainder of 
1, 02 will give a remainder of 2,..., 05 will give a remainder of 
zero, 06 will give a remainder of 1, and so on. Then the experi- 
mental conditions will be assigned to the positions in the first row 
in the order in which the remainders appear from the numbers 
randomly selected. This procedure is then repeated for the 
second and for subsequent rows of the Latin square. It will 
usually be found that after 3 or 4 rows have been drawn, 1 or 2 
treatments will appear more than once in a given column. When 
this happens, it wil be necessary to redraw the row until the 
restriction of one experimental condition is satisfied. If a small 
number of Latin squares are to be drawn, then this method will 
prove satisfactory. If many Latin squares are to be drawn, then 
it will be profitable to consult the tables of Fisher and Yates 


(1943). 
The general problem of systematic versus randomly selected 


308  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 
Latin squares has been discussed by Kendall (19480) and Fisher 


(1942). Consider the following systematie arrangement about the 
diagonal of a 4 X 4 Latin square: 


A B C D 
D A B C 
Cc D A B 
B C D A 


Let us suppose that the rows correspond to subjects and that 
each subject is tested under the experimental conditions, repre- 
sented by the letters, in the particular order indieated by the ar- 
rangement of the letters across rows. 'The columns will then 
correspond to the various trials or test periods. We may note 
that in the 4 X 4 Latin square the permutations of the 4 experi- 
mental treatments taken 2 at a time would yield 


AB BA CA DA 
AC BC CB DB 
AD BD CD DC 


but that not all of these appear in the diagonal square. Instead 
AB, BC, CD, and DA each occur 3 times. There is no case where 
the subject experiences the B condition before the A condition, the 
C condition before the B condition, and so on. 

If some systematic influence is present such that performance 
under the B condition preceded by the A condition tends either to 
be depressed or raised, then a systematic bias will be present in 
SB. Other systematic errors will be operative if relationships 
exist between B and C, C and D, or D and A. Depending upon 
the nature of these relationships, the systematic errors may tend 
either to increase or decrease the differences among the treatment 
means and thus to increase or decrease the treatment mean square 
as compared with the mean square which may be expected if the 
treatments were randomly assigned, subject only to the usual 
restrictions of the Latin square. If the bias is such that the treat- 
ment sum of squares is increased, then the residual (error) sum of 
squares will be decreased compared with that to be expected from 
random arrangements. Similarly, if the treatment sum of squares 
is decreased, the residual (error) sum of squares will be increased 


M^ 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 309 


compared with that to be expected from random arrangements. 
In random arrangements, B will not consistently follow A, as in 
the Latin square described, but rather this will be determined by 
chance. A randomly selected square will, of course, be drawn 
occasionally which will correspond to the diagonal square, but this 
again may be expected to occur no more frequently than chance 
would indicate. 

Bugelski (1949) has pointed out that any Latin square which 
involves an even number of rows and columns may be arranged in 
such a way that each treatment always precedes and follows a 
different, treatment, although this is not possible for squares 
which have an odd number of rows and columns. Consider, for 
example, the following Latin square: 


A B C D 
B D A C 
[^] A D B 
D C B A 


Running down the successive pairs of columns, it may be 
observed that we have AB, BD, CA, DC, BC, DA, AD, CB, CD, 
AC, DB, and BA. In this systematic square each letter appears 
but once in each row and each column, meeting the requirements 
of a Latin square, and each treatment follows every other treat- 
ment but once. Any systematic influence resulting from one 
experimental condition being followed by another would thus tend 
to be minimized in this square. 

In agricultural research it has been found that systematic 
squares, such as the diagonal square, applied in uniformity trials, 
where the null hypothesis is known to be true, yield results which 
are biased compared with those obtained from random arrange- 
ments. Similar studies of systematic and randomly selected 
squares have not been made in psychological research. 

Since the conditions under which the Latin square design is 
applied in agricultural research are not quite the same as in psy- 
chological research, as Grant (1948) has pointed out, we can only 
speculate concerning the relative merits of systematic and ran- 
domly selected squares in connection with a specific research 
problem—as we have done in the case of the diagonal square. 


310  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Bugelski's square, for example, may prove to be extremely useful 
in certain psychological research problems of the kind he has 
described. To what extent, if any, the tests of significance for 
this square may be biased is not known. 

Some psychologists may, for one reason or another, choose 
a particular systematic square. Other psychologists may prefer 
the satisfaction of knowing that they are following a sound and 
well-established precedent in other fields of research when they 
use squares with random arrangements. 


4. THE ANALYSIS OF VARIANCE FOR THE LATIN SQUARE 


Let us return now to the analysis of the data of Table 66. 
The total sum of squares between the cells of the table may be 
calculated in the familiar manner. Thus, 


(13.8)? + (15.8)? + (15.0)? +- -- + (14.0)? — 


(240.5)? 
—— = 11.2 
16 ae 


The sum of squares between days (columns) will be given by 
59.3)? 61.2)? 59.6)? 60.4)? (240.5)? 
(59.3) re (61.2) 4 (59.6) + (60.4) — 


4 4 4 4 16 5 
The sum of squares for dogs (rows)? is obtained by 
2 3)? 8.5)? 58.9)? 240.5)? 
(62.8) " (60.3) E: (58.5) n. (58! ( ) 2.83 


4 4 4 4 16 


The sums for treatments (dosages) are obtained by adding 
the cell entries for each dosage. For example, the sum of U, is 
given by 16.0 + 14.3 + 14.0 + 14.7 = 59.0. The sum for S; is 
given by 13.8 + 14.8 + 14.5 + 14.0 = 57.1. The other 2 sums 
for Uz and Ss are obtained in the same way. These sums and 
means are given in Table 67. 


? We shall refer to the row sum of squares as the sum of squares between 
dogs, though it must be kept in mind that this sum of squares might also be 
referred to as the sum of squares for the order of presentation of the treat- 
ments. Itisimpossible to separate the effects of these 2 variables in the present 
design, i.e., they are confounded. We may assume, however, that a sufficient 
period of time elapsed between the administration of the various dosages so 
that the particular order of presentation was unimportant. 


pax 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 311 


TABLE 67. Sums and Means for the 4 Different Dosages of the Drugs 
in the Latin Square Design Involving 4 Animals 


Dosage Sum Mean 
Si 57.1 14.275 
S» 62.2 15.550 
U2 62.2 15.550 
Ui 59.0 14.750 

Sum 240.5 15.03125 


The sum of squares for dosages will then be given by 


2 2 2 2 2 
(57.1) "n (62.2) & (62.2) 4 (59.05 — (240.5) 


4 4 4 4 16 nd 


The sums of squares for days, dogs, and dosages, each based 
upon 3 degrees of freedom, may now be subtracted from the total 
sum of squares. This will give a residual sum of squares* which 
will be equal to 


Total — days (columns) — dogs (rows) — dosages — residual (69) 


and substituting in formula (69) above, we obtain 
11.29 — 54 4 283 -— 435 = 3.17 


Similarly, by subtracting the degrees of freedom, we shall have 
a residual number of degrees of freedom associated with the 
residual sum of squares equal to 6. The residual sum of squares, 
when divided by the residual number of degrees of freedom, will 
provide the error mean square which will be used to test the sig- 
nificance of the other mean squares. 

The summary of this analysis is shown in Table 68. It may 
be determined from the table of F that a value of 4.76 will be 
required for significance for 3 and 6 degrees of freedom, and hence 
none of the mean squares tested in Table 68 meets the require- 
ments of significance.” From Table 68 and the various calcula- 


4 The direct calculation of the residual sum of squares is possible and 
will be illustrated later. 

5'The complete experiment of Bliss and Rose, involving 5 Latin squares, 
did show significant differences between the treatments. 


312 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


tions we have made, it should be clear that the variation attributa- 
ble to days and individual dogs has been isolated and is not in- 
cluded in the error term. In this sense, these variables have been 
controlled, although it happens that for this particular Latin 


TABLE 68. Analysis of Variance of the Latin Square Design for the 
Experiment on 4 Different Dosages of a Drug 


—— 


Source of Variation Sum of Squares df Mean Square F 
Dogs (rows) 2.83 3 .943 1.786 
Days (columns) 54 3 -180 
Treatments (dosages) 4.75 3 1.583 2.998 
Residual (error) 347 6 .528 

Total 11.29 15 


Istlia 


square, these sources of variation are not significant. The ef- 
ficiency of the Latin square design rests, however, in the fact that 
it does permit a control of the row and column variables, and in 
some experiments these sources, if not isolated, would be included 
in the sum of squares for error, thus inflating the error mean square 
unnecessarily. 


5. THE DIRECT CALCULATION OF THE RESIDUAL SUM OF 
SQUARES 


Let us examine now, in somewhat greater detail, the sum of 
squares which we have called the residual. Each cell entry in 
Table 66 has associated with it an “expected” value, and these 
expected values may be obtained by means of a kind of regression 
equation. If we knew nothing about the row, column, and dosage 
means, for example, and predicted the expected entry for each cell, 
our best prediction for every cell would be the combined mean 
15.03125. This would be the best predicted value for each of the 
entries in the sense that the sum of squared deviations of values 
in the cells from the combined mean would be less than from any 
other single value that we might take. If we let Y’ equal the 
predicted value for a cell entry and Y equal the actual observed 
value, then 

yay 


Pa 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 313 


and 


EQ aF =EN xy 


will be at à minimum. 

However, with knowledge of the mean performance on each 
of the various days, the mean performance of each of the dogs, and 
the means for each of the dosages, we may add some additional 
terms to the regression equation and obtain the following as the 
predicted value for the various cell entries. Thus, 


yY-2Y-c-(Y.-Y)c-(Y.-Y)c- (Y.- Y) (70) 
which may be simplified to 
Y! « Y, - Y, - Y, -2Y (71) 


In each of the above expressions, Y’ stands for the predicted 
or expected value of the cell entry; Y, stands for the mean of the 
column; Y, stands for the mean of the row; Ya stands for the mean 
of the particular dosage (or treatment); and Y stands for the com- 
bined mean. Applying formula (71), we obtain as the expected 
value for entry in the first row and first column 


Y^, = 14.82500 + 15.70000 + 14.27500 — (2) (15.03125) — 14.73750 


For the expected value in the first row and second column, we 
obtain 
Y^,5 = 15.30000 + 15.70000 + 15.55000 — (2) (15.03125) = 16.48750 


In a similar manner we may obtain the other cell entries, and these 
have been entered in Table 69. We may note that, as a check 


TABLE 69. The “Expected” Cell Entries of the Latin Square Design for 
the Experiment on 4 Different Dosages of a Drug 
E ————————————E 


Days 
Dogs Sum Mean 
1 2 3 4 
1 14.7375 16.4875 16.0875 15.4875 62.8000 15.7000 
2 15.3875 15.0025 14.1875 15.0025 60.3000 15.0750 
3 14.9375 14.1375 14.2125 15.2125 58.5000 14.6250 
E 14.2375 15.5125 15.1125 14.0375 58.9000 14.7250 
Sum 59.3000 61.2000 59.6000 60.4000 240.5000 
Mean 14.8250 15.3000 14.9000 15.1000 15.03125 


—ÓLÁ——————————————————————————? 


314 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


upon the arithmetie, the column means and row means of the 
table of expected entries will be exactly equal to the corresponding 
row and column means of the table of the original entries. 

If we now subtract the expected entries in Table 69 from the 
corresponding observed entries in Table 66, we would have the 
deviations in Table 70. 'The entries in this table thus give 


TABLE 70. Deviations of the Observed from the Expected Cell Entries 
for the Experiment on 4 Different Dosages of a Drug 


ee 


Days 
Dogs Sum 
1 2 3 4 
1 —.9375 .5125 —.0875 .5125 0000 
2 .4125 —.1625 -6125 —.2625 .0000 
8 .0625 .3625 —.2125 —.2125 0000 
4 .4625 —.1125 —.3125 —.0375 .0000 
Sum 0000 0000 0000 0000 0000 


Y — Y', where Y’ is determined by means of formula (71). Then 
squaring the deviations and summating, we would have 


Y, (Y — Y’)? = residual sum of squares (72) 
Thus, applying formula (72) above, we have 


Y: (Y - Y? = (—.9375)? + (4125)? + (0625)? +--- + (—.0875)? 
= 8.16 


which is equal, within errors of rounding, to the residual sum of 
squares as obtained previously. It should be obvious that though 
it is possible to calculate the residual sum of squares directly, it 
is much easier to obtain this sum of squares by subtraction as we 
first did. 

We may note from the entries in the table of deviations, 
Table 70, that the row means and the column means are now 
equal to zero. This will also be true of the dosage means. The 
squares of these deviations thus represent the variation that 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 315 


remains in the data after the variation attributable to the row, 
column, and treatment means has been taken into account.® 


6. APPLICATIONS OF THE LATIN SQUARE DESIGN 


Let us now inquire as to the types of psychological problems 

for which the Latin square design may prove adequate and effi- 
cient. For one thing, whenever experimental treatments or 
testings must be spread over a period of days, the day-to-day 
variation in performance of the subjects may be worth controlling, 
and this can be accomplished with a Latin square design. 
Similarly, when the temporal order or sequence of presentation 
of a series of test conditions may be expected to contribute to 
the total variation, it may prove desirable to isolate the sum of 
squares for this factor, and this can be accomplished with the 
Latin square design. 
P It is possible, also, in a single experiment to provide additional 
replieations of the treatments by drawing a series of independent 
Latin squares. This will serve to increase the number of degrees 
of freedom for the residual sum of squares and thus increase the 
efficiency of the experiment. For example, in the Bliss and Rose 
experiment, 20 dogs were divided into 5 sets of 4 each. For each 
set of 4 dogs an independently drawn Latin square design was 
used. On the first day, all 20 dogs were given the dose prescribed 
by the separate Latin square entries, corresponding to the first 
column of all 5 Latin squares. At the time of the second test 
all animals were given the dose prescribed by the separate Latin 
square entries corresponding to the second column, and so on. 
Analyzing the data from a single Latin square, we would have 


Source of Variation df 
Days (columns) 3 
Dosage 3 
Dogs (rows) 3 
Residual 6 

Total 15 


6 The reader may have recognized, from formula (69), that the residual 
sum of squares in the Latin square design is the row times column interaction 
sum of squares minus the treatment sum of squares. 


316  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


With 5 Latin squares, each of the above factors will be 
multiplied by 5 to account for 75 of the total of 79 degrees of j 


freedom available. The remaining 4 degrees of freedom will be MIT 
associated with the differenees among the 5 Latin squares. Before | 
continuing with the analysis, however, the residual mean squares, i 


each based upon 6 degrees of freedom, should be tested for homo- ! 
geneity of variance. If this hypothesis is tenable, then we would 
have the following analysis: 


Source of Variation df | 
Days 15 | 
Dosage 15 | 
Dogs 15 } 
Latin squares 4 
Residual 30 
Total 79 ! 
The sums of squares and degrees of freedom shown above, cd 
however, may be further subdivided, combined, and rearranged, | 
in the following manner: 
Source of Variation df Source of Variation df 
Days 3 
Days ib mm X Latin squares 12 
Dosage 3 
Dosage ie em X Latin squares 12 
Latin squares j 
Dogs in same square 15, Toss 19 
Residual 30 Residual 30 T 
& A = 
Total 79 Total 79 Z 


With no evidence against the homogeneity of the residual 
mean squares obtained from the separate Latin squares and non- 
significance of the two interaction mean squares, these sums of 
squares may be pooled to give the following analysis: 


Source of Variation df 
Days (columns) 3 
Dogs (rows) 19 
Dosage 3 
Residual (error) 54 

Total 79 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 317 


The residual sum of squares in the above analysis may be obtained 
by the subtraction of the days, dogs, and dosage sums of squares 
from the total sum of squares. 

Before combining data from separate Latin squares, as in 
this analysis, however, it is advisable to analyze the data of the 
separate Latin squares first. If the separate treatment (dosage) 
mean squares obtained from the separate Latin squares are not 
homogeneous, it would indicate that the experimental technique 
varies from one experiment to another and is therefore not reliable. 
A similar conclusion would be reached if the residual mean squares 
obtained from the various Latin squares were not homogeneous. 
Furthermore, if the residual mean squares prove not to be homo- 
geneous, there is no basis for combining the data to arrive at the 
last analysis shown above. The tests of homogeneity of variance 
may be made by means of Bartlett's test described previously." 
With proper randomization and adequate experimental tech- 
niques, we would expect the assumptions of homogeneity of variance 


to be valid. 
7. COMBINING THE FACTORIAL DESIGN WITH THE LATIN 
SQUARE 


The Latin square design may also be combined with the 
factorial design. Suppose, for example, that we have 3 variables 
A, B, and C, and that each is varied in 2 ways giving rise to 
(2)(2)(2) = 8 experimental conditions. These 8 experimental 
conditions may be arranged in an 8 X8 Latin square. The 
columns of the square might again correspond to the partieular 
days upon which the experimental treatments are given, the rows 
to the individual subjects, and the cell entries to the experimental 


conditions. The analysis of this design would take the following 
form: 
Source of Variation df 
Days (columns) 7 
Subjects (rows) T 
Experimental conditions T 
Residual (error) 42 
Total 63 


1 Pages 195-198. 


318 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


In the design described, it would also be possible to analyze 
further the sum of squares corresponding to experimental condi- 
tions, as is possible with the factorial designs discussed previously. 
The 7 degrees of freedom and the 7 sums of squares would be 
partitioned in the following way: 


Source of Variation 

Between A 

Between B 

Between C 

Interaction: A X B 
Interaction: A X C 
Interaction: B X C 
Interaction: A X B X c 

Total experimental conditions 


p 
arnnrnrerHę& 


The methods of computation would be the same as those 
outlined in the chapters on factorial designs. 


8. THE NATURE OF THE ROW MEAN SQUARE 


In discussing the Latin square design used in the color- 
recognition problem investigated by Vinacke, it was pointed out 
that with repeated measurements for the same subjects across 
rows, the interpretation of the mean square for rows, if significant, 
may not be clear. For example, in Vinacke’s experiment, we 
would have no way of knowing whether the significance of this 
mean square was the result of over-all differences between subjects, 
or the particular order of presentation of the colors to the subjects, 
or perhaps both. Similarly, in the Bliss and Rose study, a 
he rows might indicate significant 


significant mean square for t 
differences between the reaction of the dogs to the dosage, or it 


might indicate that the order in which the 4 dosages were adminis- 
tered was of importance. In the Bliss and Rose study, we assumed 
that a sufficient time interval elapsed between the separate dosages 
so that the row variation could be primarily attributed to differ- 
ences in the reactions between the animals. The major interest, 
here, was not in the row mean square, as such, but in removing this 
source of variation from the error mean square. This will be true 
of many Latin square designs, where the primary concern with the 


Ed 
—— — 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 319 


row mean square is the removal of an important source of variation 
Írom the error term. 

It would have been possible, however, if the order of adminis- 
tration of the drugs had been considered an important factor—and 
for the reason cited above, it apparently was not—to have designed 
the experiment, using the Latin square, but in a manner which 
would permit the separation of the 2 sources of variation, that 
between animals and that between orders of presentation. The 
technique involves drawing a single Latin square and then replicat- 
ing this same square several times. The technique is thus a 
combination of the method of analysis of the last chapter of 
repeated observations and the Latin square design. 


9. AN EXPERIMENTAL DESIGN WITH REPLICATION OF THE SAME 
LATIN SQUARE 


The technique may be illustrated with a problem carried out 
in the University of Washington Psychological Laboratory. The 
study was concerned with the ability of subjects to locate targets 
when they appeared on 5 circular screens varying in size. The 
smallest screen was 3 in. in diameter, and the diameters of the 
others increased by 1-in. intervals to the largest screen, which was 
7 in. Radii were marked on the screens at 20-deg. intervals for 
the particular phase of the investigation for which we report the 
data. Each screen was also marked by a series of expanding circles 
which were supposed to represent intervals of 10 miles from the 
center of the target. The screens were exposed to the subject by 
means of an automatic timer at the rate of one screen every 15 
sec. The screens had been photographed on a film strip and were 
projected to a ground glass plate. There were 36 projections for 
each of the 5 screen sizes, and the subject was required to locate 
the position of a target which appeared on the screen in terms of 
both degrees and miles. The data reported here are the judgments 
for degrees only. 

Twenty-five subjects were tested on all 5 of the different- 
sized screens, and the data are reported in terms of the number 


3 The data for this experiment were made available through the courtesy 
of Dr. George P. Horton and Dr. Lloyd G. Humphreys. 


320 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


of judgments which were correct. Since we have 5 observations 
on each of 25 subjects, we have a total N of 125 with 124 degrees 
of freedom. The subjects were divided, however, into groups 
of 5 subjects each. A set of 5 subjects made up a 5 X 5 Latin 
square, with columns representing successive trials, rows repre- 
senting subjects and a given order of presentation, and the cell 
entries representing the size of the screen. The Latin square 
arrangement is shown in Table 71, and it is clear from the table 
that the same Latin square was replicated 5 times. 'The scores 
of the subjects are recorded at the right of Table 71. 


10. ANALYSIS OF THE INDEPENDENT OBSERVATIONS 


In terms of the experimental designs diseussed in the previous 
chapter, let us first consider the uncorrelated data, i.e., the data 
obtained from independent groups of subjects. A total of 24 
degrees of freedom will be available for a sum of squares between 
individual subjects in the rows of the table. This sum of squares 
is obtained in the usual way and will be given by 


(59)? , (59)? 09», .. (63 _ (1,693)? _ 
p. s ies 125 = 405/408 


9 


Now if we had but a single Latin square, if the same Latin 
square was not replicated, as it is here, then the sum of squares 
based upon the above caleulations would constitute a source of 
variation which can be statistically isolated and removed from 
the error sum of squares, but its meaning, if significant, would 
not be clear. It would represent the variation attributable to 
both differences between the 5 subjects and differences between 
the 5 orders of presentation. But there would be no way to 
analyze further this sum of squares to separate the variation at- 
tributable to orders of presentation and the variation attributable 
to differences between subjects. 

The replication of the same Latin square, however, makes 
possible a further analysis or subdivision of the sum of squares 
based upon the differences between rows, which we have just 
calculated and found to be equal to 495.408. It is obvious from 
an examination of the data in the table that the first subject in 
each of the 5 Latin squares is tested under exactly the same order 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 321 


TABLE 71. The Latin Square Design and. Qutcomes of an. Experiment on the 
Ability to Locate Targets When Exposed on Different-Sized Screens— 
Cell Entries Are the Scores for the Screen 
Size Corresponding to the Latin 
Square at the Left 


—Á——M———————————————————————————— 


Lati Trials 
8 aun Order Subjects — Sum 
quare 1 8 38 4 f 
36475 I T 11 15 10 13 10 59 
47536 II 2 8 ll 11 15 14 59 
53647 IIl 3 10 8 13 ll 17 59 
64753 IV 4 13 12 16 14 9 64 
75364 V 5 11 8 9 12 15 55 
Sum 53 54 59 65 (65 296 
36475 I 6 7 18 12 % 16 70 
47536 II 7 13 13 13 5 10 54 
53647 III 8 16 16 18 14 19 83 
64758 IV 9 16 10 19 16 11 72 
75364 M 10 20 18 10 18 17 83 
Sum 72 #75 372 70: 78 362 
36475 I 11 12 17 9 19 18 75 
47536 II 12 19 16 18 14 21 88 
53647 III 13 15 15 16 14 14 74 
64753 IV 14 13 15 13 13 9 63 
75364 V 15 16 14 10 12 13 65 
Sum 75 7 66 72 15 365 
36475 I 16 13 16 17 15 15 76 
475306 II 17 8 17 12 14 17 68 
53647 II 18 12 14 15 13 16 10 
64753 IV 19 11 10 13 1i 7 52 
75364 v 20 11 15 16 16 17 78 
Sum 5s 72 73 69 72 344 
36475 I 21 14 17 18 18 15 82 
47536 II 22 9 9 16 10 I 55 
53647 III 23 12 10 14 12 16 64 
64753 IV 24 6 16 13 17 10 62 
75364 V 25 11 12 10 15 15 63 
Sum 52 64 71 72 67 326 


of presentation. And this is true of the second subject in each 
of the Latin squares, the third subject, the fourth subject, and 
the fifth subject. Hence, it is possible to find 5 sums, each of 
which will represent a particular order of presentation of the 
targets. 


322  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


For example, the sum for order 3-6-4—7-5 would be found 
by adding the sum of scores for the first subject in each of the 
5 Latin squares. This would give us 59 + 70 + 75 + 76 + 82 = 
362. The sum for the order 4-7-5-3-6 would be given by adding 
the sum of scores for the second subject in each of the Latin squares. 
This would give us 59 + 54+ 88 + 68 + 55 = 324. In the 
same manner we find the sums for the other 3 orders of presen- 
tation. These sums are given in Table 72. 


TABLE 72. Sums of Scores for Order of Presentation of the Screen Sizes 


3-6-4-7-5 4-1-5-3-6 5-3-6-4-7 6-4-7-5-3 7-5-3-6-4 


59 59 59 64 55 

70 54 83 72 83 

75 88 74 63 65 

76 68 70 52 78 

_82 55 64 _62 63 

Sum 362 324 350 313 344 


Using the sums obtained above, it is possible to calculate 
a sum of squares for order of presentation in the usual manner. 
Keeping in mind that each of the sums is based upon 25 observa- 
tions, we obtain 


2 2 2 2 2 
(862) q GA) , G50) 4 G3) (344) tae 


+ 


25 25 25 25 25 See 


The number of degrees of freedom for this sum of squares will be, 
as usual, equal to 1 less than the number of sums involved, or 4. 

Then the residual sum of squares between the 24 subjects 
may be calculated directly® from the data of Table 71 or obtained 
by subtraction. By subtraction we would have 


Residual between subjects = 495.408 — 63.008 = 432.400 


The degrees of freedom for the residual sum of squares 432.400 
may also be obtained by subtraction, giving 24 — 4 = 20. Or 


9 The direct method of calculation has been described in the previous 
chapter. 


« 


A 
E 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 323 


we may note that by the direct method of caleulation of this sum 
of squares, we would have 4 degrees of freedom for each of the 
orders of presentation, and since we have 5 orders, we would have 
(4)(5) = 20 degrees of freedom. The similarity of this part of 
our analysis to the usual two-part analysis of variance may be 
observed from the data as arranged in Table 72. It is also obvious 
that the 5 sets of scores are based upon independent, randomly 
assigned subjects. The residual mean square between subjects 
may thus be used to test the significance of the mean square for 
orders of presentation. 


11. ANALYSIS OF THE CORRELATED OBSERVATIONS 


Now let us consider the remaining variation in Table 71. 
We have accounted for 24 of the total of 124 degrees of freedom. 
What about the remaining 100 degrees of freedom? We may 
compute a total sum of squares, in the usual manner, by squaring 
each of the 125 cell entries of Table 71, summating, and then 
subtracting the correction term for origin. This sum of squares 
will be given by 

2 
an? + @ + (oy? +--+ 015)? - C59? 1,307,008 

From the total sum of squares, we subtract the 2 sums of squares 
we have already calculated, 432.400 and 63.008, and this will give 
us a sum of squares within individuals (rows), based upon 100 
degrees of freedom. Thus, 


1,327.008 — 432.400 — 63.008 = 831.600 


It is the remaining variation within individuals (rows), based 
upon 100 degrees of freedom, that we wish to analyze. 

Let us combine all 5 of our Latin squares as in Table 73. 
The manner in which this is accomplished may be illustrated by 
showing the source of the cell entry in the first row and first column 
of the table which is 57. This value is obtained by adding the 
cell entries in the first row and first column of all 5 of the Latin 
squares. Thus, 11 +7 + 12+ 13 + 14 = 57. Similarly, the 
entry in the third row and fourth column is obtained by adding 


324 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


114-14 -+ 14 + 184 12 = 64. Each cell entry in Table 73 is 
£hus based upon 5 observations. 


TABLE 73. Sums of Scores for Trials and Order of Presentation in the 
Target-Location Experiment 


NN 


Trials 
Order Sum 
1 2 3 4 5 
3-6-4-7-5 57 83 66 82 74 362 
4-7-5-3-6 57 66 70 58 73 324 
5-8-6-4-7 65 63 76 64 82 350 
6-4-7-5-3 59 63 74 71 46 313 
7-5-3-6-4 72 67 55 73 77 344 
Sum 310 342 341 348 352 1,693 


i 


From the data in Table 73, we may compute a sum of squares 
for trials based upon the sums 310, 342, 341, 348, and 352. This 
sum of squares will be given by 

3107 (342? (341)? (348)? , (852)? (1,6903)? 
(B10)? , (942)? , (Bai)? , Gu , Gon ( ) 
25 25 25 25 25 125 


— 44.128 


We may also, in terms of the usual Latin square, compute a sum 
of squares for the experimental condition, namely, size of screen. 
This sum of squares will be based upon the sums for the 5 screen 
sizes. The sum for the 3-in. screen may be obtained by adding 
the entries in Table 73, corresponding to the 3-in. screen. Thus, 
57 + 58 + 63 + 46 + 55 = 279 will be the sum for the 3-in. 
screen. The other 4 sums corresponding to the 4-in., 5-in., 6-in., 
and 7-in. screens may be found in the same way. They are 327, 
347, 364, and 376, respectively. From these sums we may obtain 
the sum of squares for the screen size which will be given by 


(279)? (827)? , (347)? , (364)? , (376)? — (1,693)? 
he Ta ae s 1d 


= 232.048 


There is still an additional sum of squares which we may 
find. This is the residual sum of squares, based upon the devia- 
tions as shown by formula (73), which is used as the error sum of 
squares in a single Latin square. We first square each of the cell 


i 


$i 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 325 


entries in Table 73, divide each square by 5, the number of observa- 
tions on which it is based, summate, and subtract the correction 
term for origin. Thus, 
67? , (67) , (05) (77)? — (1003) 
5 Cas ei de MENÉ 
Then the residual sum of squares may be obtained by subtraction 
in terms of formula (70). "Thus, 


414.208 — 44.128 — 63.008 — 232.048 = 75.024 


— 414.208 


Now let us see what has happened to the sum of squares 
within individuals 831.600, based upon 100 degrees of freedom. 
We have accounted for one portion by the variation between 
trials which is equal to 44.128 and which is based upon 4 degrees 
of freedom. A second portion is that attributable to screen size 
which is equal to 232.048 and which is based upon 4 degrees of 
freedom. A third, the residual error sum of squares for the Latin 
square design, is equal to 75.024 and is based upon 12 degrees of 
freedom. If we subtract these sums of squares and degrees of 
freedom from the within-individuals sum of squares and degrees 
of freedom, we shall have a residual sum of squares within subjects 
given by 

831.600 — 44.128 — 232.048 — 75.024 — 480.400 


with degrees of freedom given by 
100 — 4 — 4 — 12 = 80 


Now let us note the 2 residual sums of squares, that ob- 
tained from the Latin square which is equal to 75.024 and which 
is based upon 12 degrees of freedom, and that obtained from within 
subjects which is equal to 480.400 and which is based upon 80 
degrees of freedom. Both of these residual sums of squares when 
divided by their respective degrees of freedom are estimates of 
the uncontrolled variation which we have called experimental 
error. Consequently, we should not expect them to differ from 
each other except within the limits of random sampling. That 
they do not may easily be determined by computing s1? = 75.024/ 
12 = 6.25, and s = 480.400/80 = 6.00. Then F will be given 


326  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


by 6.25/6.00 — 1.04, and the agreement between the 2 estimates 
js obviously very good.!° There is thus no reason why these 2 
mean squares should not be combined to obtain a pooled error 
mean square based upon 92 degrees of freedom. The pooled 
error mean square could then be used to test the significance of 
the other mean squares for which it is the appropriate error. The 
pooling has not been done, however, in Table 74 where the analysis 


TABLE 74. Analysis of Variance of the Target-Location Experiment 


Source of Variation : imot df Mean 
quares Square 
Independent observations: 
Order of presentation 63.008 4 15.75 
Residual between individuals (error) 432.400 20 21.62 
Total between individuals 495.408 24 
Correlated observations: 
Size of screen 232.048 4 58.01 9.67 
Trials 44.128 4 1103 184 
Residual from Latin square (error) 75.024 12 6.25 1.04 
Residual within individuals (error) 480.400 80 6.00 
Total within individuals 831.600 100 
Total for experiment 1,327.008 124 


OO 


is summarized, because we wished to keep the component parts 
of the analysis clear before the reader. 


12. SUMMARY OF THE ANALYSIS 


In Table 74 we may note that we have accomplished our 
original objective: an unconfounding, so to speak, of the source of 
variation for order of presentation. It is important to note that 
this would not have been possible without replication of the same 
Latin square. We have also separated the mean squares based 
upon independent observations from those based upon observa- 


10 In a series of 16 tests based upon experimental designs such as the 
one described here, it is worthwhile as an empirical note to add that in 8 cases 
the residual mean square within subjects was larger than the residual mean 
square obtained from the Latin square and in 8 cases was smaller. The ratio 
of the average values for the 2 mean squares was equal to .96. 


r^ 


e 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 327 


tions of the same subjects in the manner of the analyses of the 
last chapter. The method of analysis here described is simply 
a bringing together of the principles of the Latin square design 
and the principles of the previous chapter. As such, the design 
should have widespread applications in psychological and educa- 
tional experiments, where repeated measures on the same subjects 
are necessary and vital parts of the design, but where an isolation 
of the variation attributable to order of presentation of the experi- 
mental variables and a test of significance of this source of varia- 
tion is of importance also. 

One of the traditional designs in psychological experiments 
on learning, for example, where 3 methods A, B, and C are being 
investigated, has usually involved 3 groups of subjects, each 
group being tested under all 3 methods, with rotated orders of 
presentation. 'Thus, one group may be tested in the order A, B, 
and C; a second group with the order B, C, and A; and the third 
group with the order C, A, and B. This is essentially a Latin 
square design, and with replication, a test of significance of order 
of presentation, trials, and methods could easily be incorporated 
into the investigation. 


13. EXAMPLES 


1. Here is an easy Latin square for practice. Assume that 
the letters A, B, C, D, and E correspond to different experimental 
conditions or treatments, and that the columns correspond to 
different, periods of testing and the rows to individual subjects. 


Time of Testing Time of Testing 

Subjects Subjects 
SP 1 2 3 4 5 123 4 5 
1 BEODCA 1 6 8 5 1 
2 CAB ED 2 218 8 9 
8 DB CAE 3 525834 
4 ECADB 4 158715 
5 A IX E B G 5 4 613 


Since the data in the cell entries above were selected from 
a table of random numbers, there is no reason to believe that 
significant differences will be found between rows, columns, or 
experimental conditions. The data, however, should serve to 


396 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


by 6.25/6.00 — 1.04, and the agreement between the 2 estimates 
is obviously very good.’° There is thus no reason why these 2 
mean squares should not be combined to obtain a pooled error 
mean square based upon 92 degrees of freedom. ‘The pooled 
error mean square could then be used to test the significance of 
the other mean squares for which it is the appropriate error. The 
pooling has not been done, however, in Table 74 where the analysis 


TABLE 74. Analysis of Variance of the Target-Location Experiment 


Sum of Mean 


Source of Variation Squares df Square F 
Independent observations: 
Order of presentation 63.008 4 15.75 
Residual between individuals (error) 432.400 20 21.62 
Total between individuals 495.408 24 
Correlated observations: 
Size of screen 232.048 4 5801 9.67 
Trials 44.128 4 1103 184 
Residual from Latin square (error) 75.024 12 6.25 1.04 
Residual within individuals (error) 480.400 80 6.00 ` 
Total within individuals 831.600 100 
Total for experiment 1,327.008 124 


—————————— 


is summarized, because we wished to keep the component parts 
of the analysis clear before the reader. 


12. SUMMARY OF THE ANALYSIS 


In Table 74 we may note that we have accomplished our 
original objective: an unconfounding, so to speak, of the source of 
variation for order of presentation. It is important to note that 
this would not have been possible without replication of the same 
Latin square. We have also separated the mean squares based 
upon independent observations from those based upon observa- 


10 In a series of 16 tests based upon experimental designs such as the 
one described here, it is worthwhile as an empirical note to add that in 8 cases 
the residual mean square within subjeets was larger than the residual mean 
square obtained from the Latin square and in 8 cases was smaller. The ratio 
of the average values for the 2 mean squares was equal to .96. 


ği 


wes 


` 


e e 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 327 


tions of the same subjects in the manner of the analyses of the 
last chapter. The method of analysis here described is simply 
a bringing together of the principles of the Latin square design 
and the principles of the previous chapter. As such, the design 
should have widespread applications in psychological and educa- 
tional experiments, where repeated measures on the same subjects 
are necessary and vital parts of the design, but where an isolation 
of the variation attributable to order of presentation of the experi- 
mental variables and a test of significance of this source of varia- 
tion is of importance also. 

One of the traditional designs in psychological experiments 
on learning, for example, where 3 methods A, B, and C are being 
investigated, has usually involved 3 groups of subjects, each 
group being tested under all 3 methods, with rotated orders of 
presentation. Thus, one group may be tested in the order A, B, 
and C; a second group with the order B, C, and A; and the third 
group with the order C, A, and B. This is essentially a Latin 
square design, and with replication, a test of significance of order 
of presentation, trials, and methods could easily be incorporated 


into the investigation. 


13. EXAMPLES 


1. Here is an easy Latin square for practice. Assume that 
the letters A, B, C, D, and E correspond to different experimental 
conditions or treatments, and that the columns correspond to 
different periods of testing and the rows to individual subjects. 


Time of Testing Time of Testing 

Subjects Subjects 
SA 123 4 5 i 2845 
1 BEDCA 1 68591 
2 C AB E D 2 21389 
3 DBCAE 3 52534 
4 BGA D B 4 I 5 9 L1 $5 
5 ADEBG 5 496138 


Since the data in the cell entries above were selected from 
a table of random numbers, there is no reason to believe that 
significant differences will be found between rows, columns, or 
experimental conditions. The data, however, should serve to 


398 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


illustrate in simple fashion the necessary calculations. (a) Find 
the sums of squares for rows, columns, treatments, and error, and 
make the tests of significance. (b) Set up the table of “expected” 
values and calculate directly the error or residual sum of squares. 

2. Sleight (1948) used a Latin square design to study the 
influence of the shape of instrument dials and exposure time on 
legibility. Five dial shapes were used: (H)orizontal, (O)pen 
window, (R)ound, (V)ertical, and (S)emicireular. The exposure 
times used were .28, .20, .17, .14, and .12 sec. In a preliminary 
experiment, 5 subjects were tested. The measurements reported 
are the number of errors made by the subject in reading the various 
dials under the various exposure times. The design and data 
are given below. 


Errors Made by Five Subjects in the Preliminary Experiment 


Exposure Speed in Seconds Exposure Speed in Seconds 
Subjects Subjects 

28 .20 .17 .14 .12 28 .20 .17 14 12 

1 H O S ER V 1 3 0 4 2 6 

2 s 5 vy H O 2 m ow S ro 

3 V HH 0 S E 3 wm 6 1 6 O0 

4 Oo 8 X Y H 4 0 4 4 12 2 

5 R V H O S 5 $8 6 B5B O T 


(a) Analyze the total sum of squares into the various com- 
ponent parts and test for significance of the dial shapes. (b) Since 
these data are small counts with quite a few zero entries, find the 
mean and the variance for each of the dial shapes. Note that 
these values are correlated and that the variance tends to be 
equal to the mean with but one exception. This indicates that 
a transformation of the original data is in order, and the trans- 
formation that is suggested is to add .5 to each cell entry and then 
take the square root. Make this transformation and find the 
mean and variance of the transformed variate for each dial shape. 
Note that the variances have become more or less stabilized. 
(c) Apply the analysis of variance to the transformed data. Are 
any conclusions changed by the new analysis? 

3. The following Latin square is taken from the study by 
Bliss and Rose (1940) cited in the chapter. 


Ki“ 


e 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 329 


Days Days 
Dogs Dogs 
1 Z $3 4 H 2 3 4 
1 U1 Us Si Se 1 140 13.8 140 14.0 
2 Us Ui S2 Si 2 16.2 14.0 13.0 13.0 
3 So Sı Us Ui 3 13.0 140 140 13.0 
4 Si Se Ui Us 4 13.2 16.0 14.9 16.4 


Analyze the total sum of squares into the components at- 
tributable to “days,” “dogs,” ‘‘doses,” and “residual” in the 
manner described in the chapter. The computations will be 
simplified considerably by subtracting 12.0 from each cell entry. 

4, Another Latin square from the same study by Bliss and 
Rose (1940) had the following arrangement: 


Days Days 
Dogs ——————— Dogs 
i 2 $9 4 1 2 8s 4 
1 Sı Ui & Us 1 142 141 150 144 
2 Ui Sı Us Se 2 13.0 134 13.8 140 
8 Us Se Ui Si 3 158 160 150 154 
4 So Us Si Ui 4 152 162 150 153 


Analyze the total sum of squares for this Latin square in the 
game manner as the one in Example 3. Again computations may 
be simplified by subtracting 12.0 from each cell entry and then 
performing the analysis with these reduced values. 

5. Compare the 2 analyses of Examples 3 and 4. Note 
that in Example 4 the residual mean square is much smaller than 
in Example 3. This means that knowing the dog means, the 
day means, and the dosage means, we can prediet the reaction of 
the dog much more accurately in Example 4 than in Example 3. 
(a) Test the significance of the difference between the 2 residual 
mean squares of Examples 3 and4. (b) What interpretation may 
be placed upon this highly significant value of F? 

6. If the 2 residual mean squares of Examples 3 and 4 
had not been found to differ significantly, the data of the 2 Latin 
squares might have been combined. Show the components of 
the total sum of squares and the degrees of freedom associated 
with each if this had been possible. 

7. De Lury (1946) reports upon & Latin square design con- 


330  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


cerned with the investigation of the reactions of rabbits to ES 
different doses of a drug. The observations reported below are in 
terms of milligrams of glucose per 100 cc. of blood. Four replica- 
tions of the Latin square arrangement were used. The different 
doses may be referred to by the letters A, B, C, and D. 


Days Days 
Rabbits M Rabbits 
12 8 4 12 8 4 
1 C A B D 1 59 56 41 54 
2 BDC A 2 56 58 73 69 
3 AC DB 3 45 41 30 28 
4 DB AC 4 62 49 63 84 
5 AB D C 5 42 39 44 61 
6 C A B D 6 49 61 38 43 
7 B DCA 7 83 81 101 96 
8 DG A B 8 56 54 65 58 
9 BDAC 9 47 46 62 76 
10 ACBD 10 90 74 61 63 
11 CQ B D A 11 79 63 58 87 
12 DAC B 12 -50 69 66 59 
13 DAC B 13 45 61 45 71 
14 G B DA 14 52 31 35 81 
15 ADB C 15 57 30 57 50 
16 B 6€ A D 16 64 83 74 607 


The data reported above by De Lury were made available 
through the courtesy of Dr. D. M. Young and the Connaught 
Laboratories of the University of Toronto. (a) Analyze the data 
for each of the Latin squares separately. (b) Test the residual 
mean squares, each based upon 6 degrees of freedom, for homo- 
geneity, using Bartlett's test. 

8. The airplane-location experiment described in this chapter 
was concerned not only with the ability of the subjects to locate 
the position of the target in terms of degrees, but also with the 
ability of the subjects to locate distance of the plane from the 
center of the target. The latter measures were in terms of miles. 
‘The experimental design and analysis are exactly the same as the 
analysis of the experiment described in the chapter. Working 


LATIN SQUARE DESIGN IN PSYCHOLOGICAL RESEARCH 331 


through the computations for this example will do much to clarify 


the nature of the analysis. 


et Latin Square 


36475 
47536 
58647 
64758 
75364 
36475 
47530 
53647 
64753 
75364 
36475 
47536 
53647 
i 64753 
| 75364 
36475 
47536 
) 53647 
| 64753 
j 75364 
A 
| 36475 
| 47536 
53647 
64753 
75364 


Order 


£ 
II 
It 
IV 
v 


The data are given below: 


Subjects 


Sum 


1 2 3 4 B5 Sum 
19 21 25 27 22 14 
23 30 29 24 28 134 
18 16 24 24 18 100 
27 24 27 29 26 133 
29 28 21 30 24 1342 
116 119 126 134 118 618 
22 20 23 81 24 120 
24 33 2 19 32 136 
29 26 29 29 27 140 
22 14 12 18 15 81 
30 31 $30 33 30 154 
127 124 122 130 128 631 
26 28 26 31 32 143 
29 30 31 29 28 4147 
30 27 30 30 31 148 
28 30 34 31 28 15 
19 19 17 25 24 104 
132 134 138 146 143 693 
i7 20 17 14 18 86 
94 26 27 25 31 133 
25 22 28 26 30 13 
23 22 25 23 16 109 
28 20 16 20 21 105 
117 110 113 108 116 564 
28 30 30 31 28 147 
1 18 27 18 24 98 
15 17 15 16 15 78 
29 26 28 28 26 137 
24 97 95 23 97 13 
107 118 125 121 120 591 
599 605 624 639 625 3092 


9. Thomson (1941) reports an analysis of data originally 
j published by Nisbet (1930). Although the original study did not 


| make use of the Latin square analysis, it will serve to illustrate 


332 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


possible applications in educational experiments. In the original 
investigation, the experimental conditions were arranged syste- 
matically about the diagonal of the square. We may do as Thom- 
son did for purposes of illustration and assume that the random 
assignment happened to yield the symmetrical arrangement. In 
Nisbet/s study there were 4 treatments or experimental conditions. 
'These consisted of 4 ways of testing spelling: multiple-choice, 
second dictation, wrongly spelled words, and skeleton words. 
Your groups of children were used and 4lists of words. An original 
dietation test of all words previous to the tests mentioned above 
was given to all groups. The data reported are the number of 
words wrong in the original dictation but correct in later tests. 
The symbols refer to the type of test used. The design and data 
are given below. 


Word Groups of Children Word Groups of Children 
Lists I IH HI IV List I II III IV 
A MC SD WS SK A 81 41 44 53 
B SK MC SD WS B 38 97 42 49 
c WS SK MC SD C 31 43 07 36 
D SD WS SK MC D 57 33 43 81 


Our major interest is in the mean square between the various 
methods of testing spelling. Find the value of F. 


% 


CHAPTER 17 


Applications of the Analysis of Covariance 


1. INTRODUCTION 


In a previous chapter,’ we have discussed an experimental 
design appropriate to the situation where the experimenter is able 
to pretest subjects and to divide them into levels of initial ability 
upon the basis of their performance on the pretest. The subjects 
within each of the various levels of initial ability are then assigned 
at random to the experimental conditions. This design makes 
possible the breakdown of the total sum of squares into 3 parts. 
One part is that associated with differences between the experi- 
mental conditions, the significance of which is of primary interest. 
But instead of testing the significance of the mean square between 
the experimental conditions by the mean square based upon the 
variation within groups, the residual mean square within groups 
was used for this purpose. 

The residual mean square within groups represents the 
variation remaining after the variation attributable to differences 
between subjects of comparable levels of initial ability has been 
taken into account. To the extent to which the subjects of com- 
parable levels of initial ability tend to react similarly under the 
experimental conditions, then, this source of variation contributes 
to the variation within groups of subjects treated alike. By 
statistically isolating this source of variation and subtracting it 
from the sum of squares within groups, the mean square used as 
an error term in the test of significance can, in many instances, be 
substantially reduced. To this extent also, the design makes 
possible the detection of smaller differences between the experi- 


1 Chapter 14. 
333 


334 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


mental coz. titi: .as, thus increasing the efficiency of the experiment.” 

In many experiments, however, in which initial ability may 
be an important source of variation, it is not practical to give all 
of the subjects a pretest before assigning them to the experimental 
conditions. The subjects, for example, may be scheduled for but 
a single experimental period. It may be possible to test the sub- 
jects at this time, but the data on initial level would be collected 
during the experimental period and after assignment to a given 
experimental condition. 

In other experiments it may be possible to equalize subjects 
with respect to an initial variable, such as weight in a feeding 
experiment designed to test the effectiveness of several diets, but 
there may be other sources of variation which enter into the 
experimental condition and which are impossible to control experi- 
mentally. In the diet experiment, for example, the amount of food 
eaten by the animals may condition the gains in weight as well as 
the different diets. All the animals might be given exactly the 
same amounts of food, but this is no guarantee that all the animals 
will eat an equal amount. Records might be kept, however, of 
the amounts eaten, and the gains in weight might be statistically 
adjusted for this source of variation. 

Let us take still another example. Two groups of subjects 
might be equated as to academic ability. One group may then 
be given a course of instruction on efficient techniques of study. 
The other group may serve as a control, and at the end of an 
academic quarter the achievement of the 2 groups might be com- 
pared. A variable which might condition the achievement 
records of the subjects in both groups is the amount of time which 
they spent in study during the course of the academic quarter. 
In the experiment described, the variable amount of time spent in 
study would be extremely difficult to control experimentally. But 
records might be kept of the amount of time spent in study by 
each subject, and the achievement scores of the subjects might 
be statistically adjusted for this source of variation. 


? Any a priori knowledge of a variable other than level of initial ability, 
such as age, which might be related to performance under the experimental 


conditions, could be used for classifying the subjects before assigning them 
to the experimental conditions. 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 335 


2. THE ANALYSIS OF COVARIANCE 


The method of analysis of experiments in which adjustments 
are made in the data for the experimental variable, in terms of 
data collected on another variable which may condition the out- 
comes of the experiment, is known as the analysis of covariance. 
The analysis of covariance is a synthesis of the methods of re- 
gression and the methods of the analysis of variance. The analysis 
of covariance is applicable to any experiment in which a source 
of variation, which it may not be possible to equalize between the 
various experimental groups prior to the experiment proper, can be 
measured. An adjustment is then made for this source of varia- 
tion in the analysis of the outcomes of the experiment. A case 
in point would be where levels of initial ability may condition the 
outcomes of the experiment, but where the subjects in the various 
groups have not been equated with respect to this variable prior 
to their assignment to the experimental conditions. If a record 
can be obtained of initial performance during the course of the 
experiment proper, the outcomes of the experiment may be ad- 
justed for this source of variation. 

The analysis of covariance is also applicable to experiments 
in which a source of variation arises during the course of the experi- 
ment. A case in point would be the feeding experiment where 
no advance knowledge as to amount of food which the animals will 
eat is available. But with records of the amounts eaten by each 
animal, the outcomes of the experiment may be adjusted for this 
source of variation. 

Let us take the very simple case of the analysis of variance 
where N subjects have been divided at random into several groups 
of n subjects each. We shall assume that each group has been 
subjected to a given experimental condition and that the measure- 
menté in which we are interested have been made under these 
conditions. The series of measurements for which we are inter- 
ested in testing the significance of the differences of the experi- 
mental groups we shall call Y. During the course of the experi- 
ment we have also obtained a series of measurements, which we 
shall call X, which we have reason to believe may condition, be 
related to, the value of Y. 


336  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


3. CORRELATION AND REGRESSION 


From formula (33) we know that one method of expressing 
the product moment correlation coefficient between the 2 variables 
X and Y is 

È xy 


"- 
Vx 


If the paired values of X and Y are plotted in a correlation chart, 
and if the relationship between the 2 variables is rectilinear, then 
a “best fitting" straight line can be drawn to represent the regres- 
sion of the Y variable on the X variable The slope of this line 
will be given by the regression coefficient of Y on X, for which we 
shall use the symbol b.* The regression coefficient of Y on X may 
be written 


_ sy 229 
br "imi ur. 

Let X and Y be expressed in terms of deviations from their 
respective means. Then predicted values of y’ falling on the re- 
gression line for corresponding values of z may be obtained by 
means of the regression equation 


y = br (74) 


A deviation of the observed value of y from the predicted value of 
y’ will be given by 


(73) 


y -y =y— be (75) 
and it is easily demonstrated that 5 (y — y’) = ÈE (y — bz) will 
be equal to zero. 

It can also be shown that 


ry -yP=LYy -br (76) 


will be at a minimum, i.e., less than the sum of squared deviations 


: S The line will give the “best fit” in the sense that the sum of the squared 
deviations of the plotted points from this line will be less than from any other 
straight line. 

. 5In correlation work, the regression coefficient of Y on X is often 
written by; to differentiate it from the regression coefficient of X on Y which 
is often written bzy. In experimental work involving the analysis of covariance, 
our interest will always be in the one regression coefficient, that of Y on X, 
and for convenience the subseripts will not be written. 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 337 


from any other straight line for which the slope is not equal to b. 
Then, by simple algebra, we may also obtain 
E(g-vyY-XEy-2220:-UXS 


and 


2 22 2 2 
Spar = 8,2 —2rs ns) 


s? (l — 7") (77) 


The square root of the variance given by (77) above is usually 
written Sy.z and is commonly called the standard error of estimate, 
for the obvious reason that it is a measure of the average errors of 
prediction or scatter of the Y values around the regression line. 
The variance itself is often referred to as a residual variance in 
that it measures the remaining variation in Y after that portion 
which can be attributed to the regression of Y on X has been taken 


into account. 
Our interest at the moment, however, is not in the residual 


variance as calculated above and as used in prediction problems 
with large samples, but rather in the sum of squares of errors of 
estimate on which it is based. From (77) above we may thus 


obtain 


Lu- = Dy -7) 


B (X ay)? 
= Eels - rcu 
-gp Se (78) 


Formula (78) is important, for it shows clearly the necessary com- 
putations in obtaining the sum of squares based upon the variation 
remaining in Y after due allowance has been made for the regression 
of Y on X. It is this sum of squares of errors of estimate which 
is used to derive the error mean square in tests of significance in 
the analysis of covariance. The degrees of freedom which will 
be available for this sum of squares will be equal to 1 less than the 
number of degrees of freedom available for the first term on the 
right, ie, Dy. UL y? represents the total sum of squares for 
Y, then this will have, as usual, N — 1 degrees of freedom. Then 


the degrees of freedom for (y — bx)? will be equal toN — 1 — 1. 


338 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


The additional degree of freedom is lost through the calculation 
of fhe regression coefficient b. 


4. PARTITIONING THE TOTAL SUM OF CROSS PRODUCTS 


The term > zy is usually called the sum of cross products or 
product sum and when divided by the appropriate number of 
degrees of freedom is called a covariance. Our next step in the 
development of the analysis of covariance is similar to the develop- 
ment we used with the analysis of variance. Just as we showed 
there that it was possible to analyze the total sum of squares into 
various component parts, so also we may show that with respect 
to the total sum of cross products, when we have several groups 
of n subjects each available, it is possible to analyze this term into 
various components. 

Let us take our N subjects who have been divided at random 
into r samples of n subjects each. For each subject we have a 
value of X. Let mz represent the mean of all N observations and 
X; the mean of the ith sample. Let a deviation of X from the 
mean of the sample of which it is a part be represented by zi, and 
let the deviation of the same measure from the mean of all N cases 
be represented by x. Thus, t; = X —X;andz = X —m,. Let 
diz be equal to X; —m,. A similar notation may be used for the 
values of Y which are paired with the values of X which we have. 
Then, considering but a single sample, we have 


z= X -m 
z — t; = X —m, — (X — X;) 
c=a+X -m -X+X; 
z = ti + diz 
In the same way it can be shown that for each value of Y paired 
with each value of X in the same sample, we may write y = y; + 


diy. Then, multiplying the paired values of v and y and sum- 
mating over the n values in the sample, we obtain 


xy = (x; + diz) (yi + diy) 


“Ma 


zy = 2 tji + diz 2 Yi + diy x Ti + ndidiy 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 339 


and since the values of > y; and > 2; will be equal to zero, we have 
I 1 
E zy = E rai + ndizdiy 
1 1 


Then summating over all r samples, we obtain 


XXcy-XXca d È ndizdiy (79) 
11 Tx T 


which may also be written 
N ron Z = r m " 
Eay= EE (X-X P+ ni-m) Pema) (80) 


We see from formula (80) above that in the simplest case 
where we have N subjects divided at random into r groups ofn 
observations each and with measures of both X and Y available 
for each subject, the total sum of cross producis may be analyzed 
into 2 parts. The first term on the right of formula (80) will 
represent the sum of cross products based upon deviations of the 
values from the means of the samples of which they are a part. 
This portion of the totalsum of cross products may be called the 
sum of cross products within groups as it corresponds to the sum of 
squares within groups. The second term on the right of formula 
(80) will represent the sum of cross products based upon the 
deviations of the means of the samples from the combined means. 
This portion of the total sum of cross products corresponds to the 
sum of squares between groups and may be called the sum of cross 
products between groups. 

Formula (80) does not represent the most convenient method 
of calculating the sums of cross products. Instead, it is easier to 
take the values of X and Y from zero origin and to apply a cor- 
rection term to the products of the original values. Thus, for the 
total sum of cross products, we have 


c. EX) 


UON (81) 


N 
fw N 


and the sum of cross products within any single group will be given 


340 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Èa- -F;) = È XY EE") (82) 


and summing over all 7 groups will give the sum of cross products 


within groups. 
The sum of cross products between groups will be given by 


i (Ex) (Ey) ] (XE *) (83) 


n 


by 


Yn(X-m)(Y;-m)- 
1 


As in the case of the analysis of the total sum of squares, the direct 
calculation of all 3 of the cross-product terms is not necessary. 
Instead, the total sum of cross products may be obtained and the 
sum of cross products between groups. Then the sum of cross 
products within groups may be obtained by subtraction. 


5. THE SUMS OF SQUARES OF ERRORS OF ESTIMATE 


Now it is clear that since we have 3 different sums of cross 
produets, and since we also have 3 corresponding sums of squares 
for X, it would be possible to calculate 3 regression coefficients? 
for the regression of Y on X. One of these would be based upon 
the total sum of cross products and the total sum of squares for X. 
Then, substituting this regression coefficient in formula (78), we 
could obtain a sum of squares of errors of estimate. The degrees 
of freedom for this sum of squares would be 1 less than the number 
of degrees of freedom for the total sum of squares for Y, since the 
calculation of the regression coefficient accounts for 1 degree of 
freedom. 

Similarly, a second sum of squares of errors of estimate within 
groups could be obtained by using the regression coefficient based 
upon the sum of cross products within groups and the sum of 
squares within groups. This sum of squares of errors of estimate 
within groups would have 1 degree of freedom less than the sum 
of squares within groups, the additional degree of freedom being 
lost through the calculation of the regression coefficient. 


5 See formula (73). 


mmi 


T 
| 
| 
| 
| 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 341 


Finally, à sum of squares of errors of estimate between groups 
could be calculated using the regression coefficient based upon the 
sum of cross products between groups and the sum of squares 
between groups. This sum of squares would have 1 degree of 
freedom less than the number of degrees of freedom between groups, 
as again the calculation of the regression coefficient would result 
in the loss of an additional degree of freedom. 


6. AN APPLICATION OF THE ANALYSIS OF COVARIANCE 


All the necessary calculations for applying the analysis of 
covariance have now been described, and the actual operations are 
much simpler than the discussion above may indicate. As an 
illustration of the arithmetic involved, let us suppose that 20 
subjects have been divided at random into 4 groups of 5 subjects 
each. The members of each group are subjected to different 
experimental conditions which we shall refer to as A, B, C, and 
D. We are interested in performance under the experimental 
conditions as measured by a variable that we shall call Y. The 
Y measures might represent steadiness scores obtained with a 
stylus maze, and the experimental conditions might represent 
different drugs, different room temperatures, different humidities, 
different degrees of noise, and so on. Prior to obtaining the 
measures of Y under the experimental conditions, each subject is 
given a preliminary series of trials with the stylus maze and these 
preliminary measures we shall call X. 

The scores of the subjects on X, the initial measure, are 
recorded in Table 75, and the scores of the subjects under the 


TABLE 75. Scores of 4 Groups of Subjects on a Preliminary Trial: X 
———————————————————— 


Groups 
A B Cc D 
4 5 10 1 $ 
8 7 8 12 
14 12 11 8 
11 9 4 10 
5 12 7 7 
Sum 42 45 40 48 


342 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


TABLE 76. Scores of 4 Groups of Subjects Tested under Different 
Experimental Conditions: Y 


Groups 

A B Cc D 

6 5 13 11 

9 T 12 17 

11 13 16 10 

9 12 9 13 

9 HM nH 9» 

Sum 44 51 61 60 


experimental conditions Y are recorded in Table 76. The position 
of the scores in Table 75 corresponds directly with the position of 
the scores in Table 76. For example, the subject in Group A 
who has an initial score on X of 4 has a score under the experi- 
mental condition Y of 6. "The other scores are paired in a similar 
fashion. 

As a first step in the application of the covariance technique 
to the experiment, we may analyze the data for X in the usual 
manner of an analysis of variance. The sums of squares will be 
given by 


2 2 2 2 _ (175)? 
Total = (4)? + (8)? + (14 +--+ (7)? - 7 m 161.75 
7" (42)? , (45)? | (407 , (48? (175) 
Between = ee EC + E MU we 7.35 
Within = total — between = 161.75 — 7.35 = 154.40 


The summary of this analysis is given in Table 77 and it is 


TABLE 77. Analysis of Variance of Scores of 4 Groups of Subjects on 
Preliminary Trial: X 


Source of Variation Sum of Squares df Mean Square F 
Between groups 7.35 3 2.45 
Within groups 154.40 16 9.65 

Total 161.75 19 


Qi 


A 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 343 


obvious that though there is some variation in the mean values of 
the various groups, this is not significant, as the mean square 
between groups is less than that within groups. Since the subjects 
have been assigned at random to the various groups and have not 
as yet been subjected to the experimental conditions, we may 
assume that the measures of X were obtained under the same 
condition and thus nothing but random variation is to be expected 
between the means. 

Now, let us make exactly the same analysis for the Y meas- 
ures. The necessary sums of squares will be given by 


5 3 ü 4. (216)? 
Total + = (6)* + (9)? + (11? +---+ (9)? - E mis 181.20 
44) , (51)? , (6D? , (60? (216)? 
Between = Ber Es [s T (6) + ( 0 NS = 38.80 
5 5 5 5 20 
Within = total — between = 181.20 — 38.80 = 142.40 


The summary of the analysis of the Y measures is given in 
Table 78. The mean square between groups, when tested by the 


TABLE 78. Analysis of Variance of Scores of 4 Groups of Subjects Tested 
under Different Experimental Conditions: Y 


Source of Variation Sum of Squares df Mean Square F 
Between groups 38.80 3 12.93 1.45 
Within groups 142.40 16 8.90 

Total 181.20 19 


———— 


mean square within groups results in an F of 1.45. From the table 
of F we find that for 3 and 16 degrees of freedom this value has a 
probability greater than .05, and hence the hypothesis that the 
groups are random samples from a common population must be 
considered tenable. 

The next step is to analyze the total sum of cross products 
in exactly the same manner in which we have analyzed the total 
sums of squares for X and Y. The necessary sums of cross 


344  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


produets will be given by 


Total = (4)(6)-+(8)(9) +0411) $= +) — gem — 123.00 


Between — 


(42) (44) 4 GED PECON + (48) (60) _ (175) (216) _ 2.60 
5 5 5 5 20 
Within — total — between — 123.00 — 2.60 = 120.40 


The essential parts of the analysis up to this point are sum- 
marized in Table 79. From the data of the table, we first calculate 
the sum of squares of errors of estimate for the total by means of 


TABLE 79. Sums of Squares and Cross Products for 4 Groups of Subjects 
on Preliminary Trial X and under Experimental Conditions Y 


i nn ne EEE 


Source of Variation df Le Yay y 
Between groups 3 7.35 2.60 38.80 
Within groups 16 154.40 120.40 142.40 

Total 19 161.75 123.00 181.20 


pe — 


formula (78). Thus, the total sum of squares of errors of estimate 
will be given by 
(123.00)? 
Total = 181.20 — —— ——— = 87. 
ota 161.75 87.67 
By the same formula, we obtain the residual sum of squares or sum 
of squares of errors of estimate within groups. Thus, 


(120.40)? _ 


Within = 142.40 — = 48.5 
within 15440 48.51 


Now, before finding the sum of squares of errors of estimate 
between groups, let us examine the nature of the sum of squares 
of errors of estimate within groups. This sum of squares, in terms 
of our earlier discussion, is free from any influence of differences 
in the means of the various groups. We have already pointed out 
that the sum of squares within groups, 142.40 in this instance, is 
independent of the sum of squares between groups, and it is also 
true that the sum of cross products within groups is independent 


es 


| 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 345 


of the product sum between the means of the groups which has 
been taken into account in the calculations of line 1 of Table 79. 

The sum of squares of errors of estimate within groups repre- 
sents the remaining variation within groups on the Y variable 
after the regression of Y on X has been taken into account. An 
examination of formula (78), for example, shows that of the sum 
of squares 142.40 the portion 93.89 can be accounted for by the 
regression of Y on X, leaving the residual 48.51 representing the 
uncontrolled variation between subjects treated alike. It is this 
sum of squares which we shall use, therefore, to obtain the mean 
square to be used as the error term in testing the significance of 
the difference between the experimental conditions. 

The sum of squares of errors of estimate within groups will 
have 15 degrees of freedom, in the present, problem, which is 1 
less than the 16 degrees of freedom available for the within-groups 
sum of squares. The additional degree of freedom is lost, as 
pointed out before, in the calculation of the regression coefficient. 
The degrees of freedom for the sum of squares of errors of estimate 
for total will be 1 less than the number of degrees of freedom for 
the total sum of squares, an additional degree of freedom being lost 
here also by the caleulation of the regression coefficient for the 
total. Thus, in the present, problem, the degrees of freedom for 
this sum of squares will be equal to 18. 

Instead of calculating a sum of squares of errors of estimate 
between groups, using the regression coefficient which would be 
obtained from the product sum and sum of squares between groups 
and applying formula (78), we shall obtain an "adjusted" sum of 
squares between groups by taking the difference between the 2 
sums of squares of errors of estimate which we have already cal- 


culated.? Thus, 
Sum of squares of sum of squares of adjusted sum of 
errors of estimate — errors of estimate = squares between (84) 
for total within groups groups 


5 This method yields the appropriate “adjusted” sum of squares between 
groups from which the mean square to be tested for Significance will be derived, 
making allowance for the sampling errors of b. See Fisher (1942), pp. 171- 


172. 


346 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


Substituting the necessary values in formula (84) above, we obtain 
87.67 = 48.51 = 39.16 


Since we have not calculated a new regression coefficient in obtain- 
ing the adjusted sum of squares between groups, no additional 
degree of freedom is lost. We shall still have the original number 
of degrees of freedom for this sum of squares, which is equal to 3. 


7. INTERPRETATION OF THE ANALYSIS 


The analysis is summarized in Table 80. The value of F 
for the test of significance of the adjusted means is obtained by 
dividing the mean square for the adjusted means 13.05 by the 
mean square for the errors of estimate within groups 3.23. The 


TABLE 80. Analysis of Covariance of Performance of 4 Groups of Subjects 


Sum of Squares of 


Source of Variation Esrore ct'Hatimata df Mean Square F 
"Total 87.67 18 
Within groups 48.51 15 3.23 

Adjusted means 39.16 3 13.05 4.04 


obtained value of F which is equal to 4.04 will be based upon 3 and 
15 degrees of freedom, and, from the table of F, we find that this 
has a probability of less than .05 and is thus significant. The 
meaning of this significant value of F is that it indicates that the 
differences in the means of the experimental groups on the Y 
variable cannot be accounted for by differences in mean level of 
initial ability as measured by X in the preliminary trials; for the 
means of the groups on the Y variable have been "adjusted" by 
the analysis to a common mean initial level of performance on X. 

From the data of Table 79 it is possible to compute the cor- 
relation coefficients within groups and also the correlation co- 
efficient between the means of the groups on X and Y. The cor- 
relation coefficient within groups will be given by 

120.40 


SS 


Tzy si Ho e 
(within) (154.40) (142.40) » 


ut 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 347 


and the correlation coefficient between the means will be given by 
2.6 
fem -S 
vV (7.35) (38.80) 


From the correlation coefficient within groups, which is equal to 
.81, we may note that there is a decided tendency for subjects who 
were high in initial level of performance (X) to also be high when 
tested under a given experimental condition (Y). On the other 
hand, the rather low correlation coefficient of .15 between the 
means indicates that there is no pronounced tendency for the 
groups with the higher initial means on X to have higher means on 
Y, i.e., when tested under the experimental conditions. 

A comparison of these 2 correlation coefficients should give 
a good indication of the conditions under which an analysis of 
covariance will prove efficient in detecting differences between the 
means of the groups on the Y variable. Obviously, the greater 
the correlation coefficient within groups, the smaller the sum of 
squares of errors of estimate within groups will be, and conse- 
quently the smaller the mean square which will be used in testing 
the significance of the differences for the adjusted means. The 
reduction in the mean square which will be used for error will be 
given by a variation of formula (77). Thus, 


Sum of squares within groups 


Eh =1)-1 
1 


2 
Qr "Viti 


From this variation we see that if the correlation within 
groups is as high as .70, it would result in a reduction of approxi- 
mately 50 per cent, and a correlation coefficient of .50 would result 
in a reduction of approximately 25 per cent in the error mean 
square. Differences between experimental conditions, which 
might not be detected by an analysis of variance of the Y measures 
alone, might very well be significant when tested against the mean 
square for error of the analysis of covariance, when the correlation 
within groups is substantial. 

For the same purpose of detecting differences in the means of 
the groups under the experimental conditions, it is desirable for 


348  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


the sum of cross products between the means to be as near zero 
as possible. Under this condition there will be no correlation 
between the means of Y and the means of X. If, at the same time, 
there is correlation within groups, then the total sum of squares for 
Y and the within-groups sum of squares for Y will be reduced by 
equal amounts, but there will be no reduction in the sum of squares 
between groups on the Y variable. Hence, while the error mean 
square of the covariance analysis will be reduced, compared with 
the error mean square of the analysis of variance, the mean square 
between groups will remain unchanged. Thus, the value of F 
obtained from the covariance analysis may be significant while 
that obtained from the analysis of variance may not. This was 
the case with the analysis just described. 

With fairly large groups and random assignment of the sub- 
jects to the groups, we would not expect the means of the groups 
to vary as much on the measure of initial ability as would be the 
case with smaller groups. The more nearly alike the groups are 
in their mean level of performance on the initial measure X, the 
smaller the sum of cross products between the means must be. In 
the limiting case with exactly equal means on X, the sum of cross 
products between the means must necessarily be zero and the cor- 
relation zero also. Any differences between the means of the 
groups on the Y variable then could not possibly be accounted for, 
or influenced by, the means of X, but it will still be possible for 
correlation to be present within groups, resulting in a reduced 
error term in the test of significance. 

These principles apply, of course, to the situation where X is 
some measure of initial ability or other variable which will be 
positively correlated with performance under the experimental 
conditions. In experiments of this nature, the analysis of co- 
variance has as its primary purpose (1) the increased precision in 
the test of significance through the reduction of the mean square 
used as the error term by the regression of Y on X, and as à 
secondary objective (2) making allowances for differences between 
the means of the groups on the X variable. With large groups 
and random assignment, the secondary objective becomes of 
less importance for reasons given earlier. 


Ee 


BN 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 349 


8. ANOTHER APPLICATION OF COVARIANCE ANALYSIS 


There are other experiments, however, in which the impor- 
tance of the two objectives listed above are reversed. "These are 
experiments in which the analysis of covariance is used to de- 
termine the extent to which differences in X, arising during the 
course of the experiment, influence the outcomes of the experiment 
as measured by Y. An example would be the case cited earlier on 
the effectiveness of a course of instruction in how to study. The 
groups might be equated with respect to academic ability, but one 
group might spend much more time in study than the other group. 
In this experiment the question which the covariance analysis 
seeks to answer is whether the difference in achievement between 
the groups is primarily the result of the course of instruction (the 
experimental condition) or of the number of hours spent in study. 
Did the groups gain differently in achievement because of differ- 
ences in the number of hours spent in study or in spite of such 
differences? 

The application of the analysis of covariance to an experiment 
such as that described will now be considered. In so doing we shall 
take a design which we have used before. Let us suppose that we 
have 50 subjects and that it was possible to give a pretest of initial 
ability and that the subjects were divided into 10 levels of 5 sub- 
jects each on the basis of the pretest. In the course of the experi- 
ment, observations were also made of a variable X which we have 
reason to believe might condition the outcomes on the variable Y 
in which our main interest centers. The design is thus that of 
matched subjects across rows, but in addition we have measures 
of a second variable X. The data of the experiment are given in 
Table 81. 

A straightforward analysis of variance for the Y variable 
would give a sum of squares for levels of initial ability (rows), 
groups (columns), and a residual sum of squares within groups. 
These sums of squares will be given by 


(530)? 


50 = 708.00 


Total  — (13)? + (7)? + (9? +++ + 07)? — 


350  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


ED? V (4 y Qo + (10)? a (115)? — P = 144.20 


Columns = sg 7t 10 i al 
2 2 (40)? .(62):. (530) 
Rows _ ay, ame, UF 4, 4 Go OY = 166.40 
5 5 5 5 50 
Residual within groups = 708.00 — 144.20 — 166.40 — = 397.40 


A summary of this analysis of variance of the Y measures is 


TABLE 81. Scores of 5 Groups of Subjects Matched Across Rows on the Basis 
of Initial Ability 


Observations of Variable X Made during Course of 


Levels of Experi t 
Initial xperimen: Sum 
Ability Group 1 Group 2 Group3  Group4 — Group 
1 13 2 T 3 5 30 
2 T 3 11 4 6 31 
3 10 5 11 5 7 38 
4 5 4 9 6 9 33 
5 8 8 12 7 6 41 
6 10 7 11 7 8 43 
Z 9 9 13 8 6 45 
8 11 9 8 9 10 47 
9 5 12 11 10 11 49 
10 Rd 11 9 ll 12 50 
Sum 85 70 102 70 80 407 
Teviot Observations on Variable Y the Experimental 
Initial Variable Sum. 
Ability Group 1 Group 2 Group3 | Group4 Group5 
al 13 8 8 8 7° 44 
2 7 13 9 8 10 47 
3 9 7 8 7 9 40 
4 4 4 10 9 13 40 
5 10 14 12 11 8 55 
6 7 12 13 11 13 56 
ii 6 14 18 13 9 60 
8 12 8 16 15 15 66 
9 6 9 19 10 14 58 
10 7 5 17 18 17 64 
Sum 81 94 130 110 115 530 


B 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 351 


given in Table 82. The value of F obtained for the significance 
of the mean square between groups is 3.266, and for 4 and 36 - 
degrees of freedom this has a probability of less than .05 and may be 


qABLE 82. Analysis of Variance of Y Variable for the 5 Groups of 


Matched Subjects 
c RP AT a 
we Sum of Mean 
Source of Variation Squares df Sgae F 
Between groups (columns) 144.20 4 36.050 3.266 
Between levels (rows) 166.40 9 18.489 1.675 
Residual within groups (error) 397.40 36 11.039 
Total 708.00 49 


l Oo oa 


regarded as significant. The hypothesis of random sampling from 
a common population would be rejected, and we would conclude 
that the means of the groups on the Y variable differ significantly. 

The same analysis may be applied to the X variable, the nec- 


essary sums of squares being given by 


A 2 2 2 (407)? 

Total = 03?-- (7) +(10)?-+ +++ + 02) — “fo 7 378.02 
(85)? , T0)? , (102)? , (70)? (80)? (407)? _ 

Columns-^ ig T 10 +49 UNET: T em 69.92 
Go, GD*, Gs» ,.., , GO GGoD* _ 

Rows == S 5 T 5 25 A3 5 EU 98.82 

Residual within groups — 378.02 — 69.92 — 98.82 = 209.28 


The results of the analysis of variance of the X variable are 
given in Table 83. This analysis indicates that the mean square 
between groups is significant and raises à rather important question 
concerning the differences found in the analysis of variance of the 
Y variable. To what extent can the significant mean square 
between groups on the Y variable be accounted for by differences 
between the groups on the X variable? If the test of significance 
of the Y variable is adjusted for the differences existing on the X 
variable, will the same conclusion concerning significance be 


352 EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


TABLE 83. Analysis of Variance of the X Variable for the & Groups of 
Matched Subjects 


yog Sum of Mean ; 
Source of Variation Squares df Square F 
Between groups (columns) 69.92 4 17.480 3.007 
Between levels (rows) 98.82 9 10.980 1.889 
Residual within groups (error) 209.28 36 5.813 
Total 378.02 49 


a ———————————C 


reached? Will a significant difference between groups on the Y 
variable still be present in spite of the differences observed on x? 

To answer the above questions, the experimenter would turn 
to an analysis of covariance. This will require the computation 
of the necessary product sums corresponding to the analysis of 
variance of X and Y. These will be given by 


Tot = (13)(13) + 0) + + 09an) — AVE — 260.80 
. (85X(81) , (T0)(94) , ... , (SO)115) _ (407)(530) _ 
Columns = i0 + 10 E LE 7 48.30 
ma, = CONEY BDUD y GOD. HIDEO L 1196p 
5 5 5 50 
Residual within groups = 260.80 — 48.30 — 110.60 = 101.90 


The results of our calculations up to this point aresummarized 
in Table 84. The total sum of squares and products has been 
divided into 3 parts. In the last line of the table we have combined 
the 2 sources in which we are interested, the residual within 
groups, which has to do with the error term, and that between 
groups, the significance of which we are interested in determining. 
The sum of squares and products for rows have been isolated from 
the error term, and we have no further interest in these. The sums 
of squares of errors of estimate are given in the last column of the 
table, and they have been obtained from formula (78). Thus, the 
sum of squares of errors of estimate for the new experimental 
total, eliminating the rows from consideration, will be given by 

(150.20)? 


541.00 —^— —— = 
279.20 460.80 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 353 


TABLE 84. Sums of Squares and Cross Products for the 5 Groups of 


Matched Subjects 
Sums of 
3 is 2 » Squares of 
Source of Variation df x È zy Lry Trosor df 
Estimate 
Between groups 
(columns) 4 69.92 48.30 144.20 
Between levels (rows) 9 98.82 110.60 166.40 
Residual within groups 
(error) 36 209.28 101.90 397.10 347.78 35 
Total 49 378.02 260.80 708.00 
Between groups 4- 
error 40 279.20 150.20 541.60 460.80 | 39 


———————————————————— 


and the degrees of freedom for this sum of squares will be 1 less 
than that for the residual sum of squares within groups plus the 
sum of squares between groups, or 40 — 1-239. The sum of 
squares of errors of estimate for experimental error will be given by 
(101.90)? 5 
397.40 2928 347.78 
This sum of squares will be based upon 1 degree of freedom less 
than the residual sum of squares within groups 397.40 and will 
thus be equal to 36 — 1 = 35. The difference between the 2 sums 
of squares just caleulated will give the adjusted sum of squares 
between groups. Thus, 


Adjusted sum of squares between groups = 460.80 — 347.78 
113.02 


The degrees of freedom for this adjusted sum of squares will re- 
main the same, since the adjustment has not required the caleu- 
lation of a new regression coefficient. This sum of squares 
therefore will have 4 degrees of freedom. 


9. INTERPRETATION OF THE ANALYSIS 


The covariance analysis is summarized in Table 85. The 
test of significance toward which all our calculations have been 
directed is the value of F obtained by 28.26/9.94 = 2.84. For 4 


354  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


TABLE 85. Analysis of Covariance of the Scores for the 5 Groups of 
Matched Subjects 


Sums of 
Air Squares Mean 
Source of Variation ar Errors df Square F 
of Estimate 
Between groups + error 460.80 39 
Residual within groups (error) 347.78 35 9.94 
Adjusted means 113.02 4 28.26 2.84 


, Adjusted means O O O O OoOo o o ————————— — 


and 35 degrees of freedom this value has a probability of less than 
.05 and may be regarded as significant. So we see that the sig- 
nificance of the differences between the means of the various groups 
on the Y variable cannot be accounted for in terms of the differ- 
ences between the groups on the X variable. This answers the 
question which we set out to answer. 

We may note that the mean square between groups has been 
reduced from 36.05, for the analysis of variance, to 28.26, for the 
analysis of covariance, but that this has been accompanied by a 
reduction in the error mean square from 397.40, for the analysis of 
variance, to 347.78, for the analysis of covariance (and also a loss 
of 1 degree of freedom for the test of significance). These reduc- 
tions have been fairly proportional, and consequently little change 
was observed in the F obtained from the analysis of variance and 
from the analysis of covariance. This will not always be the case. 
Tt could be that the F obtained from the analysis of covariance 
would not be significant, whereas that obtained from the analysis 
of variance would be. This would indicate, of course, that the 
differences between the group means on the Y variable can be 
accounted for largely in terms of differences between the groups on 
the X variable. Under this cireumstance the mean square for the 
adjusted means after the analysis of covariance would be reduced 
greatly, whereas the reduction in the error mean square might be 
very small. 

The adjustment from the analysis of covariance may also 
give results just the opposite of this. The error mean square from 
the analysis of covariance may be much smaller than that from the 
analysis of variance and this may also be accompanied by 2? 


J-——— 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 355 


increase in the adjusted mean square between groups. In this 
instance, obviously, the value of F obtained from the covariance 
analysis would be much larger than that obtained from the analysis 
of variance. 


10. THE USE OF THE ANALYSIS OF COVARIANCE IN RESEARCH 


The analysis of covariance should prove to be a useful tool 
in psychological and educational research in situations of the kind 
described earlier. In particular, it is applicable to those situations 
where the matching of groups is not feasible prior to the assign- 
ment of the subjects to the experimental conditions, but where some 
measure of initial performance may be obtained after the assign- 
ment. In experiments of this sort, the analysis of covariance may 
be effectively used to reduce the error mean square in the test of 
significance. The other type of experiment in which the analysis 
of covariance should prove useful is where a source of variation 
occurs during the course of the experiment and can be measured 
but not controlled experimentally, as in the examples previously 
described. In experiments of this sort a major question may be 
whether the differences observed under the experimental condi- 
tions appear because of differences on the X variable or in spite of 
such differences. 

The analysis of covariance may also be profitably employed 
with designs such as the analysis of the data of Table 81 involved. 
Here subjects had been matched with respect to level of initial 
ability, but measurements were also obtained of another variable 
which occurred during the course of the experiment which may 
have conditioned the outcomes. 

It is to be emphasized, however, that the application of the 
analysis of covariance requires foresight and planning in the de- 
sign of the experiment. Nothing much will be gained in the 
way of increased precision or new knowledge from the indiscrimi- 
nate application of covariance analysis with merely the hope that 
something good will come of it. 


11. EXAMPLES 


1. Suppose that 45 subjects have been divided at random 
into 3 groups of 15 subjects each. AIl3 groups are given a practice 


3560  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


period on a performance test and initial measures are recorded for 
each subject. 'The initial performance scores we shall designate 
as the X variable. At a subsequent period the 3 groups are re- 
tested but under different experimental conditions. For example, 
the 3 groups might be given different dosages of a particular drug; 
they might perform under different sets of instructions. What- 
ever the experimental conditions might happen to be, we shall 
designate the measures obtained as the Y variable. .The X and 
Y values for each subject in each of the groups are given below. 


Group 1 Group 2 Group 3 
X Y x Y X Y 
12 13 5 5 8 ul 
6 7 9 15 9 13 
9 9 7 13 10 15 
4 5 9 15 3 7 
7 10 9 9 5 T 
9 7 7 16 7 B 
8 6 9 18 11 10 
10 12 6 15 12 18 
4 6 ll 17 8 13 
6 7 9 12 5 10 
3 9 10 1i 12 17 
4 14 7 9 6 9 
6 8 9 7 8 13 
1l 6 9 8 11 14 
12 10 5 7 4 T 


(a) Perform an analysis of variance of the initial measures 
of performance. Does it seem likely that the subjects were 
assigned at random to the 3 groups? (b) Perform an analysis of 
variance of the measures obtained under the experimental con- 
ditions. Relate these findings to the analysis of variance of the 
initial measures. Can you infer that there are significant differ- 
ences between the experimental conditions? (c) Now perform 
an analysis of covariance and test for significance the adjusted 


mean square between groups. Are any conclusions changed by 
the covariance results? 

2. In order to provide practice in a three-part analysis of 
covariance, let us assume that the subjects in Example 1 are all 
of the same chronological age and tbat they are matched across 


APPLICATIONS OF THE ANALYSIS OF COVARIANCE 357 


rows on the basis of mental age. The X measures may be assumed 
to represent values of a variable recorded during the course of the 
experiment. (a) Analyze the total sum of squares and cross 
produets for X and Y into 3 parts: between groups, between 
matched subjects, and the residual. (b) Apply the analysis of 
covariance and test for significance of differences between groups 
on the Y variable. 

3. In an experiment by Mowrer (1934), previously unrotated 
pigeons were tested for clockwise postrotational nystagmus. The 
rate of rotation was one revolution in 144 sec. An average initial 
score for each pigeon based upon 2 tests, is indieated by the 
symbol X. The 24 pigeons were then divided into 4 groups of 6 
each. Each group was then subjected to 10 daily periods of 
rotation under one of the experimental conditions indicated below. 
The rotation speed was the same as during the initial test and the 
rotation periods lasted 30 sec., with a 30-sec. rest interval between 
each period. Groups 1, 2, and 3 were practiced in a clockwise 
direction only. For Group 4 the environment was rotated in a 
counterclockwise direction. At the end of 24 days of practice, 
each group was tested again under the same conditions as on the 
initial test. These records are called Y. 


Group 1 Group 2 Group 3 Group 4 
Rotation of body Rotation of body Rotation of body Rotation of 
only. Vision only. Vision and environ- environment 
excluded permitted ment only 
Initia] Final Initial Final Initial Final Initial Final 
X Y X Y X Y x Y 
23.8 7.9 285 25.1 275 201 229 19.9 
238 71 185 207 281 177 25.2 28.2 
2260 77 203 203 357 168 208 181 
228 112 26.6 189 135 13.5 277 305 
32.0 6.4 212 25.4 25.9 21.0 19.1 19.3 
19.6 10.0 24.0 30.0 279 293 322 351 


(a) Analyze the total sums of squares and products into 
2 parts: between groups and within groups. Apply the analysis 
of covariance and test the adjusted mean square for significance. 
4. Wyatt (1945) gave 8 musie students and 8 nonmusic 
students a pretest on the Seashore pitch test, series B. The sub- 


358  EXPERIMENTAL DESIGN IN PSYCHOLOGICAL RESEARCH 


jects in both groups were then trained individually in pitch into- 
nation and pitch discrimination. 
adjusted to the individual subjects, but a description of general 
procedures ean be found in the monograph by Wyatt. 
end of the training period, which lasted one semester, the subjects 
were retested. The scores given below are based upon an average 


of 2 to 4 tests per subject. 


Music Group 


Nonmusic Group 


The training methods were 


Pretest 


33.8 


(a) Analyze the total sums of squares and products into 
2 parts: between groups and within groups. (b) Apply the analy- 
sis of Covariance to the data and make the test of significance for 
the differences between the posttest scores for the 2 groups. 


Posttest 


42.5 
43.0 
45.5 
43.0 
40.0 
41.5 
37.0 
43.5 


Pretest 


36.5 
31.0 
31.5 
31.5 
34.0 
32.5 
30.5 
36.5 


Posttest 


38.0 
41.5 
32.0 
32.5 
35.5 
42.0 
38.5 
40.0 


- 


Bibliography 


Alexander, H. W. 1946. A general test for trend. Psychol. 
Bull., 43, 533-557. 

Bartlett, M. S. 1934. Recent work on the analysis of variance. 
J. R. statist. Soc. Suppl., 1, 252-255. 

Bartlett, M. S. 1935. The effect of non-normality on the ¢ 
distribution. Proc. Camb. phil. Soc., 31, 223-231. 

Bartlett, M. S. 1936. Square-root transformation in analysis 
of variance. J. R. statist. Soc. Suppl., 3, 68-78. 

Bartlett, M. S. 1937. Some examples of statistical methods of 
research in agriculture and applied biology. J. R. statist. 
Soc. Suppl., 4, 137-170. 

Bartlett, M.S. 1947. The use of transformations. Biometrics, 
3, 39-52. 

Baxter, B. 1940. The application of factorial design to a psy- 
chological problem. Psychol. Rev., 47, 494-500. 

Baxter, B. 1941. Problems in the planning of psychological 
experiments. Amer. J. Psychol., 54, 270-280. 

Benepe, O. J. 1949. The sensitivity of ¢ and F to departures 
from normality. University of Washington: Master’s Thesis. 

Bliss, C. I. 1937. The analysis of field experimental data ex- 
pressed in percentages. Plant Protection, Leningrad, No. 12, 
67-77. 

Bliss, C. L, and Rose, C. L. 1940. The assay of parathyroid 
extract from the serum calcium of dogs. Amer. J. Hyg., 31, 
79-98. 

Bloomers, P., and Lindquist, E. F. 1942. Experimental and 
statistical studies: application of newer statistical techniques. 
Rev. educ. Res., 12, 501—520. 

Brandt, A. E. 1941. The relations between the design of an 

359 


360 BIBLIOGRAPHY 


"experiment and the analysis of variance. J. Amer. statist. 
Ass., 36, 283-292. 

Brozek, J., and Alexander, H. 1947. A note on the components 
of variation in à two-way table. Amer. J. Psychol, 60, 
629-636. 

Bugelski, B. R. 1949. A note on Grant’s discussion of the Latin 
square principle in the design and analysis of psychological 
experiments. Psychol. Bull., 46, 49-50. 

Burt, C. 1947. A comparison of factor analysis and analysis of 
variance. Brit. J. Psychol. statist. Sect., 1, 3-26. 

Butsch, R. L. C. 1932. Eye movements and the eye-hand span 
in typewriting. J. educ. Psychol., 23, 104-121. 

Child, I. L. 1946. Children’s preference for goals easy or difficult 
to obtain. Psychol. Monogr., No. 280. 

Churchman, C. W. 1948. Theory of experimental inference. 
New York: Macmillan. 

Clark, M., and Worcester, D. A. 1932. A comparison of the 
results obtained from the teaching of shorthand by the word 
unit method and the sentence method. J. educ. Psychol., 


23, 122-131. 
Cochran, W. G., and Cox, G. 1944. Experimental design. 
Mimeographed. 


Cochran, W.G. 1947. Some consequences when the assumptions 
for the analysis of variance are not satisfied. Biometrics, 3, 
22-38. 

Crespi, L. P. 1942. Quantitative variation of incentive and 
performance in the white rat. Amer. J. Psychol., 55, 467—517. 

Cronbach, L. J. 1949. Statistical methods applied to Rorschach 
test scores: a review. Psychol. Bull., 46, 393-429. 

Crump, S. L. 1946. The estimation of variance components in 
analysis of variance. Biometrics, 2, 7-11. 

Crutchfield, R. S. 1938. Efficient factorial design. J. Psychol., 
5, 339-346, 

Crutchfield, R. S., and Tolman, E. C. 1940. Multiple-variable 
design for experiments involving interaction of behavior. 
Psychol. Rev., 47, 38-42. 

Daniels, H. E. 1939. The estimation of components of variance. 
J. R. statist. Soc. Suppl., 6, 186-197. 


BIBLIOGRAPHY 361 


De Lury, D. B. 1946. The analysis of Latin Squares when some 
observations are missing. J. Amer. statist. Ass., 41, 370-389. 

Dressel, P. L. 1939. The effect of high school on college grades. 
J. educ. Psychol., 30, 612-617. 

Dunlap, J. W. 1940. Applications of analysis of variance to 
educational problems. J. educ. Res., 33, 434-442, 

Dunlap, J. W. 1941. Recent advances in statistical theory and 
application. Amer. J. Psychol., 54, 583-601. 

Edwards, A. L. 1941a. Political frames of reference as a factor 
influencing recognition. J. abnorm. soc. Psychol., 36, 34-50. 

Edwards, A. L. 1941b. Rationalization in recognition as a 
result of a political frame of reference. J. abnorm. soc. 
Psychol., 36, 224-235. 

Edwards, A. L. 1946. Statistical analysis for students in psy- 
chology and education. New York: Rinehart. 

Edwards, A. L. 1947. University of Washington conducts ani- 
mal experimentation survey. Bull. Nat. Soc. med. Res., 2, 
1-4. 

Edwards, A. L. 1948. Note on the “correction for continuity" 
in testing the significance of the difference between correlated 
proportions. Psychometrika, 13, 185-187. 

Edwards, A. L. 1950a. On "the use and misuse of the chi- 
square test"—the case of the 2 X 2 contingency table. 
Psychol. Bull., 47, in press. 

Edwards, A. L. 1950b. The use of interactions as “error terms" 
in the analysis of variance. Educ. psychol. M. smi., in press. 

Edwards, A. L., and Horst, P. 1950. The calculation of sums of 
squares for interactions in the analysis of variance. Psycho- 
metrika, in press. 

Eisenhart, C. 1947a. The assumptions underlying the analysis 
of variance. Biometrics, 3, 1-21. 

Eisenhart, C. 19475. Inverse sine transformation of propor- 
tions. In Selected techniques of statistical analysis, Statistical 
Research Group. New York: McGraw-Hill, pp. 395- 
416. 

Englehart, M.D. 1941. The analysis of variance and covariance 
techniques in relation to the conventional formulas for the 
standard error of a difference. Psychometrika, 6, 221-233. 


362 BIBLIOGRAPHY 


Fertig, J. W. 1936. The use of interaction in the removal of 
correlated variation. Biometric Bulletin, 1, 1-14. 

Festinger, L. 1943. A statistical test for means of samples from 
skew populations. Psychometrika, 8, 205-210. 

Fisher, R. A. 1921. On the “probable error” of a coefficient of 
correlation. Metron, 1, Part 4, 1-32. 

Fisher, R. A. 1934. Discussion on Dr. Wishart’s paper. J. R. 
statist. Soc. Suppl., 1, 51-53. 

Fisher, R. A. 1936. Statistical methods for research workers. 
(6th ed.) Edinburgh: Oliver & Boyd. 

Fisher, R. A. 1942. The design of experiments. (3d ed.) 
Edinburgh: Oliver & Boyd. 

Fisher, R. A., and Yates, F. 1943. Statistical tables for biologi- 
cal, agricultural and medical research. (2d ed.) Edin- 
burgh: Oliver & Boyd. 

Foster, W. S. 1923. Experiments on rod divining. J. appl. 
Psychol., 7, 303-311. 

Garner, F. H., and Sanders, H. G. 1938. A study of the effect 
of feeding oils to dairy cows and of the value of the Latin 
square lay-out in animal experimentation. J. agric. Sci., 28, 
541-555. 

Garrett, H. E., and Zubin, J. 1943. The analysis of variance in 
psychological research. Psychol. Bull., 40, 233-267. 
Gaskill, H. V., and Cox, G. M. 1937. I. Respiration; use of 
analysis of variance and covariance in psychological data. 

J. gen. Psychol., 16, 21-38. 

Gilliland, A. R., and Humphreys, D. W. 1943. Age, sex, method, 
and interval as variables in time estimation. J. genet. 
Psychol., 63, 123-130. 

Glanville, A. D., Kreezer, G. L., and Dallenbach, K. M. 1946. 
The effect of type-size on accuracy of apprehension and 
speed of localizing words. Amer. J. Psychol., 59, 220-235. 

Goulden, C. H. 1939. Methods of statistical analysis. New 
York: Wiley. 

Grafton, J. B. 1948. A statistical analysis of the graduate 
comprehensive examination in the department of psychology. 
University of Washington: Master’s Thesis. 

Graham, F. K., and Kendall, B.S. 1946. Performance of brain- 


Lf 


BIBLIOGRAPHY 363 


damaged cases on a memory for designs test. J. abnorm. 
soc. Psychol., 41, 303-314. 

Grant, D. A. 1944. On “the analysis of variance in psychologi- 
eal research." Psychol. Bull., 41, 158-166. 

Grant, D. A. 1948. The Latin square principle in the design 
and analysis of psychological experiments. Psychol. Bull., 
45, 427-442. 

Grant, D. A. 1949. The statistical analysis of a frequent ex- 
perimental design. Amer. J. Psychol., 62, 119-122. 

Hammond, K. R. 1948. Measuring attitudes by error-choice: 
an indirect method. J. abnorm. soc. Psychol, 43, 
38-48. 

Hartman, G. 1939. Application of individual taste differences 
towards phenylthio-carbamide in genetic investigations. 
Ann. Eugen., Camb., 9, 123-135. 

Hellman, M. 1914. A study of some etiological factors of 
malocclusion. Dent. Cosmos, 56, 1017-1032. 

Hey, B. B. 1938. A new method of experimental sampling 
illustrated on certain non-normal populations. Biometrika, 
30, 68-80. 

Hoel, P. G. 1947. Introduction to mathematical statistics. 
New York: Wiley. 

Horton, G. P. 1948. Unpublished data. Seattle: University 
of Washington. 

Horton, H. B. 1948. A method for obtaining random numbers. 
Ann. math. Statist., 19, 81-85. 

Humphreys, L. G. 1943. The strength of a Thorndikian re- 
sponse as & function of the number of practice trials. J. 
comp. Psychol., 35, 101-110. 

Irwin, J. O. 1981. Mathematical theorems involved in the 
analysis of variance. J. R. statist. Soc., 94, 285-300. 

Irwin, J. O. 1934. On the independence of the constituent 
items in the analysis of variance. J. R. statist. Soc. Suppl., 1, 
236-251. 

Jackson, R. W. B. 1940. Applications of the analysis of variance 
and covariance method to educational problems. Dept. 
Educ. Res. Bull., No. 11, Univ. Toronto. 

Jackson, R. W. B. 1941. Some difficulties in the application of 


364 BIBLIOGRAPHY 


the analysis of covariance method to educational problems. 
J. educ. Psychol., 32, 414—122. 

Jelinek, E. M. 1940. On the use of the intra-class correlation 
coefficient in testing the difference of certain variance ratios. 
J. educ. Psychol., 31, 60-63. 

Johnson, P. O., and Tsao, F. 1944. Factorial design in the 
determination of differential limen values. Psychometrika, 
9, 107-145. 

Johnson, P. O., and Tsao, F. 1945. Factorial design and co- 
variance in the study of individual educational development. 
Psychometrika, 10, 133-162. 

Kempthorne, O. 1947. Query No. 48. Biometrics, 3, 98-99. 

Kendall, M. G., and Smith, B. B. 1938. Randomness and ran- 
dom sampling numbers. J. R. statist. Soc., 101, 147-166. 

Kendall, M. G., and Smith, B. B. 1939. Second paper on 
random sampling numbers. J. R. statist. Soc. Suppl., 6, 
51-61. ~ 

Kendall, M.G. 1948a. Who discovered the Latin square. Amer. 
Statist., 2, 13. 

Kendall, M. G. 1948b. The advanced theory of statistics. Vol. 
II (2nd ed.). London: Griffin. 

Kendler, H. H. 1945. Drive interaction: I. Learning as a 
function of the simultaneous presence of the hunger and 
thirst drives. J. exp. Psychol., 35, 96-109. 

Kennedy, J. L. 1939. A methodological review of extra-sensory 
perception. Psychol. Bull., 36, 59-103. 

Kogan, L. S. 1948. Analysis of variance—repeated measure- 
ments. Psychol. Bull., 45, 131-143. 

Kuenne, M. R. 1946. Experimental investigation of the relation 
of language to transposition behavior in young children. 
J. exp. Psychol., 36, 471-490. 

Leahy, A. M. 1935. A study of adopted children as a method 
of investigating nature-nurture. J. Amer. statist. Ass., 30 
281-287. A x 

Lewis, D. 1948. Quantitative methods i 
Arbor: Edwards Brothers. 2 ac 

Lewis, D., and Burke, C. J. 1949. 'The use and misuse of the 
chi-square test. Psychol. Bull., 46, 433-489. 


£ 


" 
= 


> 


P 


BIBLIOGRAPHY 365 


Lewis, H. B. 1944. An experimental study of the role of the 
ego in work. I. The role of the ego in cooperative work. 
J. exp. Psychol., 34, 113-1206. 

Lewis, H. B., and Franklin, M. 1944. An experimental study 
of the role of the ego in work. II. The significance of 
task-orientation in work. J. erp. Psychol., 34, 195-215. 

Lindquist, E. F. 1940a. Statistical analysis in educational re- 
search. Boston: Houghton Mifilin. 

Lindquist, E. F. 19405. Sampling in educational research. 
J. educ. Psychol., 31, 561—574. 

Lindquist, E. F. 1947. Goodness of fit of trend curves and 
significance of trend differences. Psychometrika, 12, 65-78. 

Long, L., and Hill, J. 1947. A follow-up study of veterans re- 
ceiving vocational advisement. J. consult. Psychol., 11, 


88-92. 
Loucks, R. B. 1948. Unpublished data. Seattle: University of 
Washington. 


Maier, N. R. F. 1945. Reasoning in humans. III. The mech- 
anisms of equivalent stimuli and of reasoning. J. exp. 
Psychol., 35, 349-360. 

Marks, E.S. 1947. Selective sampling in psychological research. 
Psychol. Bull., 44, 267-275. 

Mather, K. 1947. Statistical analysis in biology. (2d ed.) 
New York: Interscience. 

MeNemar, Q. 1940. Sampling in psychological research. Psy- 
chol. Bull., 37, 331-365. 

McNemar, Q. 1947. Note on the sampling error of the difference 
between correlated proportions or percentages. Psycho- 
metrika, 12, 153-157. 

MeNemar, Q. 1949. Psychological statistics. New York: Wiley. 

Merritt, C. B., and Fowler, R. G. 1948. The pecuniary honesty 
of the publie at large. J. abnorm. soc. Psychol, 43, 
90-93. 

Moore, K. 1944. The effect of controlled temperature changes 
on the behavior of the white rat. J. exp. Psychol., 34, 70-79. 

Morgan, J. J. B. 1945. Value of wrong responses in inductive 
reasoning. J. exp. Psychol., 35, 141-146. 

Mowrer, O. H. 1934. The modification of vestibular nystagmus 


366 BIBLIOGRAPHY 


by means of repeated elicitation. Comp. Psychol. Monogr., 
No. 5. 

Newcomb, T. M. 1943. Personality and social change. New 
York: Dryden. 

Nisbet, S. D. 1939. Non-dictated spelling tests. Brit. J. 
educ. Psychol., 9, 29-44. 

Pearson, E. S. 1931. The analysis of variance in cases of non- 
normal variation. Biometrika, 23, 114-133. 

Pearson, K. 1914. Tables for statisticians and biometricians. 
Cambridge: Cambridge Univ. Press. 

Peatman, J. G., and Schafer, R. 1942. A table of random 
numbers from Selective Service numbers. J. Psychol., 14, 
295-305. 

Peters, C. C. 1944. Interaction in analysis of variance inter- 
preted as correlation. Psychol. Bull., 41, 287-299. * 

Rosenzweig, S. 1943. An experimental Say of “repression” 
with special reference to need-persistive and ego-defensive 
reactions to frustration. J. exp. Psychol., 32, 64-74. 

Shen, E. 1940. Experimental design and statistical treatment 
in educational research. J. exp. Educ., 8, 346-353. 

Sherman, M. 1928. The differentiation of emotional responses 
in infants. J. comp. Psychol., 7, 265-284; 335-351. 

Simpson, T. W. 1938. Experimental methods and human nutri- 
tion. J. R. statist. Soc. Suppl., 5, 46-58. 

Sleight, R. B. 1948. The effect of instrument dial shape on l 


legibility. J. appl. Psychol., 32, 170-188. 
Smith, J. G., and Duncan, A. J. 1945. Sampling statistics and 
applications. New York: McGraw-Hill. paa 
Snedecor, G. W. 1934. Calculation and interpretation of the 
analysis of variance and covariance. Ames, Iowa: Collegiate 
Press. 
Snedecor, G. W. 1935. Analysis of the covariance of statistically 
controlled grades. J. Amer. statist. Ass., 30, 263-268. 
Snedecor, G. W. 1946a. Statistical methods. (4th ed.) Ames, 
Iowa: State College Press. 
Snedecor, G. W. 1946b. Query. Biometrics, 2, 56. 
Spence, K. W. 1944. The nature of theory construction in 
contemporary psychology. Psychol. Rev., 51, 47-68. 
manm 


A 


BIBLIOGRAPHY 367 


Spence, K. W. 1948. The postulates and methods of ‘“‘behavior- 
ism." Psychol. Rev., 55, 67-18. 

Taylor, W. S. 1942. The use of interactions in analysing psycho- 
logical data. Brit. J. Psychol., 32, 248-258. 

Terman, L. M.,and Merrill, M.A. 1937. Measuring intelligence. 
Boston: Houghton Mifflin. 

Thomson, G. H. 1941. The use of the Latin square in de- 
signing educational experiments. Brit. J. educ. Psychol., 
11, 135-137. 

Tippett, L. H. C. 1927. Tracts for computers, No. 15, Random 
sampling numbers. Cambridge: Cambridge Univ. Press. 

Tippett, L. H. C. 1941. The methods of statistics. (3d ed.) 
London: Williams & Norgate. 

Trealoar, A. E. 1942. Random sampling distributions. Min- 
neapolis: Burgess. 

Uspensky, J. V. 1937. Introduction to mathematical proba- 
bility. New York: McGraw-Hill. 

Vinacke, W. E. 1943. Unpublished study. Reported in H. E. 
Garrett & J. Zubin. The analysis of variance in psychologi- 
cal research. Psychol. Bull., 1943, 40, 233-267. 

Walker, H. M. 1940. Degrees of freedom. J. educ. Psychol., 
31, 253-269. 

Walker, H. M. 1943. Elementary statistical methods. New 
York: Holt. 

Walker, H. M. 1947. Certain unsolved statistical problems of 
importance in psychological research. Harvard educ. Rev., 
17, 297-304. 

Wishart, J. 1934. Statistics in agricultural research. J. R. 
statist. Soc. Suppl., 1, 26-51. 

Wishart, J. 1938. Field experiments of factorial design. J. 
agric. Sci., 28, 299-306. 

Wyatt, R. F. 1945. Improvability of pitch discrimination. 
Psychol. Monogr., No. 267. 

Yates, F. 1933a. The formation of Latin squares for use in 
field experiments. Emp. J. exp. Agric., 1, 235-244, 

Yates, F. 1933b. The principles of orthogonality and confound- 
ing in replicated experiments. J. agric. Sci., 23, 108-145. 

Yates, F. 1934a. The analysis of multiple classifications with 


368 BIBLIOGRAPHY 


unequal numbers in the different classes. J. Amer. statist. 
Ass., 29, 51-66. 
Yates, F. 1934b. Contingency tables involving small numbers 
and the x? test. J. R. statist. Soc. Suppl., 1, 217-235. 
Yates, F. 1935. Complex experiments. J. R. statist. Soc. 
Suppl., 2, 181-223. 

Yates, F. 1937. The design and analysis of factorial experi- 
ments. Imp. Bur. Soil Sci., Technical Communication 
No. 35. 

Yates, F., and Cochran, W. G. 1938. The analysis of groups 
of experiments. J. agric. Sci., 28, 556-580. 


Yates, F. 1946. A review of recent ateisiatieal e" in 
sampling and sampling surveys. J. R. statist. Soc., 109, 
12-30. 


Yule, G. U. 1938. A test of Tippett’s random sampling numbers. 
J. R. statist. Soc., 101, 167-172. 

Yule, G. U., and Kendall, M. G. 1947. An introduction to the 
theory of statistics. (13th ed.) London: Griffin. 


——- À — 


List of Formulas 


The number given in the parentheses is used throughout the 
text to refer to the formula. The page on which the formula 
appears in the text is given at the left. 


Page Number Formula 
vo XX 
17 (1) X- N 
xqx-xy 
i 
18 (2) $= Ni 
= j= x)? 
18 8 ey 


s 
19 (4) Sz = VN 


35 (5) nP, = hae 

36 ( Cr = Zum 

41 G) ee ui 

44 (8)  (mpspspaps--- Pr) (019293 ` * © qo.) = pg’? 

m (9) PE) =r) = Cpg = wa pq 

45 (00 +g” = p + npg + "er pe 
369 


370 


LIST OF FORMULAS 


FOL STOR FORMULAS 


Page 


58 


Number 


(11) 


(12) 


(13) 
(14) 
(15) 


(16) 
(17) 
(18) 
(19) 
(20) 
(21) 
(22) 


(23) 


(24) 


Formula 
n(n —1)(n — 2) 455 " 2 
JEN EE e Leg = 
Mae ^ * s 
= saj 
a e 2 c 
d oV 2r 
(X — m) 
z= EE 
m = np 
o = Vnpq 
m=p 
T [pa "E 
n 
2 
B (o na €) 
2 e 
nE a 
npq 
e S pig , P202 
Op; —p, 7 op, "EF Cp; = = T — 
n fi 
Lompi + napa 
nm + Ne 


E = CE Dur d) 
aii (N) Q1) (n2) 


2-0-02. 
e 


2_ N (bc — ad)? 
X = (at b)(c 4- d)(a 4- c) (b 4- d) 
NM 
N(|be — ad| - = 
CEP 


X = (a b)(c-- d)(a F c) (b 4- d) 


LIST OF FORMULAS 371 
Page Number Formula 
87 (25) Opp = Vap? + ap — 2Ppp:9p:Tps 
(bc — ad) 
88 26 Ts = 
Q0) n= Varner da c0 d 
d 
88 — (QD omen = 4 x 
(d — a) 
2 = 
89 (28) x= "T+a 
(d — al - 1) 
90 (29) pi — D2 = N 
; (ld — a| - D? 
90 (80) x= Cam 
2 N? (na)? e 
106  GD æ= Farad [= nN 
113 (32) 2 = V2x" — V2(df) -1 
| dry 
mO M UE 
eske p 
122 Gn ari 
r 
t= = 
124 (85) J F N-2 
126 (36) z’ = Y [loge (1 + r) — log, (1 — r)] 
1 
128 (37) ga = VN 3 
129 (38) z'-z —i196)(cs) and 
Za! = 2’ + (1.96) (oz) 
131 (39) Ozz = oz" F ou X : + x 
n -3 m-3 


372 
Page 
135 


144 


146 


150 


150 


150 


150 


150 


150 


151 


153 


154 


154 


Number 


(40) 


(41) 


(42) 


(43) 


(44) 


(45) 


(46) 


(47) 


(48) 


(49) 


(50) 


(51) 


(52) 


LIST OF FORMULAS 


Formula 

5 ne [E -56)f 
P= Tr -3)@') > (n — 8) 

X—m 
t= 

Sz 

mi = X — (to5)(sz) and 
mz = X + (Los) (sz) 


/ 2 2 
Ozi = Voz, du 


ei? s? 
Oz—2,— p T 

81 S2 
82,2, = my my 


m+n, — 2 

fetter En tE 
m +n —2 j Scb 

ny ng 


M" (ese ++) 
Žž; ni +n — 2 nı T». 


rs-r QU 


2s? 
82,3, = AJ 
n 


p. 09 
(X — Xe)? 


BET 


[2s? 
n 


$22, = 


t 


Page Number 
154 (53) 
155 (53a) 
155 (53b) 
163 (54) 
168 (55) 
177 (56) 
178 (57) 
179 (58) 
180 (59) 
181 (60) 
184 (61) 
214 (62) 


LIST OF FORMULAS 


~ 


Formula 


2s? 
NUTS 


(X — X2)? 
= 9e 
ia 
=n 
" s 
-—;oF--— 
So $17 


pa = GPN) + (elb) 


2 2 
S82, + Sz, 


rc = LX tay d? 
(X — my EE (X-X 
1 
N zi nr 
E (X; — m)? 
T 
+ r 
EEr? 
ip 
r(n — 1) 
nd (X; —m)? 
M MR nog? = o? 


F 


Between groups — 


Interaction = 


r 


mean square between groups’ 


mean square within groups 
(Xi X Xj 
kn 


[(a + d) — (b+)? 
kn E 


373 


374 


LIST OF FORMULAS 


Page 


245 


216 


277 
287 


287 


287 


311 


313 
313 
314 
336 


336 
336 
336 
337 


337 


339 


339 


Number 


(63) 


(64) 


(65) 
(66) 


(67) 
(68) 
(69) 


(70) 
(71) 
(72) 
(73) 


(74) 
(75) 
(76) 
(77) 


(78) 
(79) 


(80) 


Formula 
Interaction: A X B X C = ZAB xC) 


-BxC -— 
x Ls. 
84,2, — n(n = 1) 


2 2 
Suez = Vsa? + sz," — 272,2,82,8% 


R x C interaction = within columns — between 
TOWS 


R x C interaction — within rows — between 
columns 


R x C interaction = total — between rows 
— between columns 


Total — between columns — between rows 
— treatments = residual 


yt = F+- P) +F,- F) + Ya- ¥) 
Y'= F. +I, + Y, - 2Y 
Z (Y —Y')? = residual sum of squares 


T UE) 
& ibe 
y = bx 
y-y-y-bz AI 


x(g-yy-X(-by 
Sys = sj — r) 


a ee -2 


xs 
Ei = EÈ ay: + X ndadi 
N Ton 
Z ay = ZEX - Xj - Y) 


+ x n(X; — mz)(Y; — my) a 


Page 


339 


340 


340 


345 


Number 


(81) 


(82) 


(83) 


(84) 


LIST OF FORMULAS 375 


Formula 


n(X; = Mz) (Yi == my) 


GAG) Exe) 


=z n T N 


Sum of squares of sum of squares of 
errors of estimate — errors of estimate 
for total within groups 
adjusted sum of 
= squares between 
groups 


Appendix 


LL 


828 


TABLE I. Table of Random Numbers* 


COLUMN NUMBER 


Row | qo000 00000 11111 11111 22222 22222 33333 33333 
01234 56789 01234 56789 01234 56789 01234 50780 
1st Thousand 

00 23157 54859 01837 25993 76249 70886 95230 30744 
01 05545 55043 10537 43508 90611 83744 10902 21343 
02 14871 60350 32404 36223 50051 00322 11543 80834 
03 38976 74951 94051 75853 78805 90194 32428 71695 
04 | 97312 61718 99755 30870 94251 25841 54882 10513 
05 11742 69381 44339 30872 32797 33118 22647 06850 
06 43361 28859 11016 45623 93009 00499 43640 74036 
07 93806 20478 38208 04491 55751 18932 58475 52571 > 
08 49540 13181 08429 84187 69538 29661 77738 09527 3 
09 36768 72633 37948 21569 41959 68670 45274 83880 8 
10 07092 52392 24627 12067 06558 45344 67338 45320 4 
1 43310 01081 44863 80307 52555 16148 89742 94647 | 
12 61570 06360 06173 63775 63148 95123 35017 46993 
13 31352 83799 10779 18941 31579 76448 62584 86919 
14 57048 86526 27795 93692 90529 56546 35065 32254 
15 09243 44200 68721 07137 30729 75756 09298 27650 
16 97957 35018 40894 88329 52230 82521 22532 61587 
17 93732 59570 43781 98885 56671 60826 95996 44569 
18 72621 11225 00922 68264 35666 59434 71687 58167 
19 61020 74418 45371 20794 95917 37866 99536 19378 
20 97839 85474 33055 91718 45473 54144 22034 23000 
21 89160 97192 22232 90637 35055 45489 88438 16361 
22 25966 88220 62871 79265 02823 52862 84919 54883 
23 81443 31719 05049 54806 74690 07567 65017 16543 
24 11322 54931 42362 34386 08624 97687 46245 23245 


* Table I is reproduced from M. G. Kendall and B. B. Smith. Randomness and random sampling numbers. J, R. statist. Soc., 101 (1938), 
147-166, by permission of the Royal Statistical Society. 


er Nf. V | 
IN ‘al t if 


(11 - 


ud vomit Se CS 
N v NM 


Ss 


TABLE I. Table of Random Numbers*—Continued 


COLUMN NUMBER 
Row | 90000 00000 11111 11111 22222 22222 33333 33333 
01234 56789 01234 56789 01234 56789 01234 56789 

2nd Thousand 
00 64755 83885 84122 25920 17696 15655 95045 95947 
01 10302 52289 77436 34430 38112 49067 07348 23328 
02 71017 98495 51308 50374 66591 02887 53765 69149 
03 60012 55605 88410 34879 79655 90169 78800 03666 
04 37330 94656 49161 42802 48274 54755 44553 65090 
05 47869 87001 31591 12273 60626 12822 34691 61212 
06 38040 42737 64167 89578 39323 49324 88434 38706 
07 73508 30908 83054 80078 86669 30295 56460 45336 
08 32623 46474 84061 04324 20628 37319 32356 43969 
09 97591 99549 36630 35106 62069 92975 95320 57734 
10 74012 31955 59790 96982 66224 24015 96749 07589 
11 56754 26457 13351 05014 90966 33674 69096 33488 
12 49800 49908 54831 21998 08528 26372 92923 65026 
13 43584 89647 24878 56670 00221 50193 99591 62377 
14 16653 79664 60325 71301 35742 83636 73058 87229 
15 48502 69055 65322 58748 31446 80237 31252 96367 
16 96765 54692 36316 86230 48296 38352 23816 64094 
17 38923 61550 80357 81784 23444 12463 33992 28128 
18 77958 81694 25225 05587 51073 01070 60218 61961 
19 17928 28065 25586 08771 02641 85064 65796 48170 
20 94036 85978 02318 04499 41054 10531 87431 21596 
21 47460 60479 56230 48417 14372 85167 27558 00368 
22 47856 56088 51992 82439 40644 17170 13463 18288 
23 57616 34653 92298 62018 10375 76515 62986 90756 
24 08300 92704 66752" 66610 57188 79107 54222 22013 


| Os e a MÀ 


* Table I is reproduced from M. G. Kendall and B. B. Smith. Randomness and random sampling numbers. 
147-100, by permission of the Royal Statistical Society. 


J. R. statist, Soc., 101 (1938), 


XIaNuddv 


e 


64 


| 


TABLE I. Table of Random Numbers*—Continued 
ee a li e 


COLUMN NUMBER 
Row 00000 00000 11111 11111 22222 22222 33333 33333 
01234 56789 01234 56789 01234 56789 01234 56789 
3rd Thousand 

00 89221 02362 65787 74733 51272 30213 92441 39051 
01 04005 99818 63918 29032 94012 42363 01261 10650 
02 98546 38066 50856 75045 40645 22841 53254 44125 
03 41719 84401 59226 01314 54581 40398 49988 65579 
04 28733 72489 00785 25843 24613 49797 85567 84471 
05 65213 83927 77762 03086 80742 24395 68476 83792 
06 65553 12678 90906 90466 43670 26217 69900 31205 
07 05668 69080 73029 85746 58332 78231 45986 92998 
08 39302 99718 49757 79519 76373 47262 91612 
09 64592 32254 45879 29431 05981 18067 87137 
10 07513 48792 47314 83660 05336 82579 91582 
11 86593 68501 56638 99800 35148 56541 07232 
12 83735 22599 97977 81248 99560 32410 67614 
13 08595 21826 54655 08204 17033 56258 05384 
14 11273 27149 44293 69458 63962 15864 35431 
15 00473 175908 56238 12242 72631 76314 47252 06347 
16 86131 53789 81383 07868 89132 96182 07009 86432 
17 33849 78359 08402 03586 03176 88663 08018 22546 
18 61870 41657 07468 08612 98083 97349 20775 45091 
19 43898 65923 25078 86129 78491 97653 91500 80786 
20 29939 39123 04548 45985 60952 06641 28726 46473 
21 38505 85555 14388 55077 18657 94887 67831 70819 
22 31824 38431 67125 25511 72044 11562 53279 82268 
23 91430 03767 13561 15597 06750 92552 02391 38753 
24 38635 68976 25498 97526 96458 03805 04116 63514 


* Table I is reproduced from M. G. Kendall and B. B. Smith. Randomness and random sampling numbers. J. R. statist. Soc., 101 (1938), 
147-166, by permission of the Royal Statistical Society. 


"n 


08€ 


XiaNuddv 


TABLE I. Table of Random Numbers*—Continued 
i COLUMN NUMBER 
Row | 00000 00000 11111 11111 22222 22222 33333 33333 
01234 56789 01234 56789 01234 56789 01234 56789 
4th Thousand 
00 02490 54122 27944 39364 94239 72074 11679 54082 
01 11967 36469 60627 83701 09253 30208 01385 37482 
02 48256 83465 49699 24079 05403 35154 39613 03136 
03 27246 73080 21481 23536 04881 89977 40484 93071 
04 32532 77265 72430 70722 86529 18457 92657 10011 
05 66757 98955 92375 93431 43204 55825 45443 69265 
06 11266 34545 76505 97746 34668 26999 26742 97516 
07 17872 39142 45561 80146 93137 48924 64257 59284 T 
08 62561 30365 03408 14754 51798 08133 61010 97730 i 
09 62796 30779 35497 70501 30105 08133 00997 91970 5 
10 75510 21771 04339 33660 42757 62223 87565 Z 
11 87439 01691 63517 26590 44437 07217 98706 Z 
12 97742 02621 10748 78803 38337 65226 92149 zi 
13 98811 06001 21571 02875 21828 83912 85188 
14 51264 01852 64607 92553 29004 26695 78583 
15 40239 93376 10419 68610 49120 02941 80035 
16 26936 59186 51667 27645 46329 44681 94190 
17 88502 11716 98299 40974 42394 62200 69094 81646 
18 63499 38093 25593 61995 79867 80569 01023 38374 
19 36379 81206 03317 78710 73828 31083 60509 44091 
20 93801 22322 47479 57017 59334 30647 43061 26660 
21 29856 87120 56311 50053 25365 81265 22414 02431 
22 97720 87931 88265 13050 71017 15177 06957 92919 
23 85237 09105 74601 _ 46377 59938 15647 34177 92753 
24 75746 75268 31727 95773 72364 87324 36879 06802 


* Table I is reproduced from M. G. Kendall and B. B. Smith, Randomness and random sampling numbers. J. R. statist. Soc., 101 (1038), | %9 
147-166, by permission of the Royal Statistical Society. 


IS 


TABLE I. Table of Random Numbers*—Concluded 
gas pup ode uo SCITUR UE GRADU E CAL RUE EM D 


COLUMN NUMBER 
Row | 90000 00000 11111 11111 22222 22222 33333 33333 
01234 56789 01234 56789 01234 56789 01234 56789 
5th Thousand 

00 | 29035 06971 63175 ` 52579 10478 89379 61428 21363 
01 15114 07126 51890 77787 75510 13103 42942 48111 
02 | 03870 43225 10589 87629 22039 94124 38127 65022 
03 | 79390 39188 40756 45209 65959 20640 14284 22900 
04 | 30035 06915 79196 54428 64819 52314 48721 81594 
05 | 29039 99861 28759 79802 68531 39198 38137 24373 
06 | 78196 08108 24107 49717 09599 43569 84820 94956 
07 15847 85493 91442 91351 80130 73752 21539 10986 
08 | 36614 62248 49194 97209 92587 92053 41021 80064 
09 | 40549 54884 91465 43862 35541 44466 88894 74180 
10 | 40878 08997 14286 09982 90308 78007 51587 16658 
il 10229 49282 41173 31468 59455 18756 08908 06660 
12 15918 76787 30624 25928 44124 25088 31137 71014 
13 13403 18796 49909 94404 64979 41462 18155 98335 
14 | 66523 94596 74908 90271 10009 98648 17640 68900 
15 | 91665 36469 68343 17870 25975 04662 21272 50020 
16 | 67415 87515 08207 13129 73201 57593 96917 69699 
17 | 76527 96996 23724 33448 63392 32394 60887 90017 
18 | 19815 47789 74348 17147 10954 34355 81194 54407 
19 | 25592 53587 16384 72575 84347 68918 05739 57222 
20 | 55902 45539 63646 31609 95999 82887 40666 66092 
21 | 02470 58316 19794 22482 42423 96162 47491 17264 
22 18630 53263 13319 97619 35859 12350 14632 87659 
23 | 89073 38230 16003 92007 59503 38402 16450 33333 
24 | 62986 67364 06595 17427 84623 14565 82860 57300 


* Table I is reproduced from M. G. Kendall and B. B. Smith. Randomness and random sampling numbers, J. R. statist. Soc., 101 (1938), 
147-166, by permission of the Royal Statistical Society. 


cse 


XIGaNsuddv 


APPENDIX 383 


TABLE II. Table of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000* 


N? yN 1/N N 72 


= 


$ 1 1.0000 1.000000 41 1681 

2 4 1.4142 .500000 42 1764 

5 9 17321 .933333 43 1849 6 

t 16 2.0000 -250000 44 . 1936 .022727 

DI 25 2.2361 .200000 45 2025 .022222 

6 36 2.4495 .166667 40 2116 .021739 

7 49 2.6458 142857 47 2209 021277 

8 64 2.8284 .125000 48 2304 .020833 

Y 81 3.0000 LMU 49 2401 d .020408 
10 100 3.1623 .100000 50 2500 7.0711 .020000 
11 121 3.3106 .090909 51 2601 .019608 
12 144 — 3.4641 .083333 52 2704 .019231 
13 169 3.6056 .076923 53 2809 .018868 
14 1906 3.7417 .071429 54 2916 -018519 
15 225 3.8730 066667 55 3025 .018182 
16 256 4.0000 .062500 56 — 3136 .017857 


17 989 41231 .008824 | 57 3249 7.5498 .017544 
18 324 42420 1055556 | 58 3364 7.6158 .017241 
19 361 4.3589 .052032 || 59 3481 7.6811 016949 
20 400 44721 .050000 || GO 3600 7.7460 .016067 


291  44l 45826 047619 || 61 3721 7.8102 .016393 
22 484 46904 045455 | 62 3844 7.8740 .010129 
23 529 41958  .043478 | 63 3969 7.9373 015873 
21 576 48990 041667 | 04 4096 8.0000 .015625 
28 625 50000 040000 | 65 4225 8.0023 .015385 


26 676 5.0090 .038402 | 66 4356 8.1240 015152 
2? 729 5.902 .037037 | 67 4489 8.1854 .014925 
26 784 52915 035714 | 68 4624 8.2462 .014706 
29 841 5.32852 .034483 | 69 4761 8.3000 .014493 
30 900 5.4772 .033333 | 70 4900 8.3666 .014286 


31 901 5.5078  .092258 | 71 5041 8.42601 .014085 
32 1024 5,0000  .031250 | 72 5184 84853 .013880 
33 1089 57446  .030303 || 73 5320 8.5440 1013699 
31 1156 5.8310 029412 || 74 5476 8.6023 013514 
35 1295 5.9101 .028571 || 75 5625 8.6603 .013333 


36 1296 6.0000 .027778 | 76 5776 87178 .013158 
39 1369 6.0828 027027 || 77 5929 87750 012987 
35 1444 61644 — .020316 || 78 6084 88318 1012821 
30 1521 02450 .025641 | 79 6241 8.8882 1012658 
10 1600 6.3246  .025000 | 80 6400 89443 :012500 


* Portions of Table II have been reproduced from J. W. Dunlap and A. K. Kurtz. 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 
(1932), by permission of the authors and publishers. 


384 APPENDIX 


TABLE II. Table of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000*—Continued 


ee 


N N: N 1/N N N? VN 1/N i 
81 6561 9.0000 .012546 121 14641 11.0000 .00826446 
82 6724 — 9.0554  .012195 122 14884 11.0454 .00819672 ' 
83 6889 9.1104  .012048 123 15129 11.0905  .00813008 
84 7056 9.1652  .011905 124 15376 11.1355  .00806452 
85 1225 9.2195  .011765 125 15625 11.1803  .00800000 
86 7396 9.2736  .011628 126 15876 11.2250  .00793651 
87 1569 9.3274  .011494 127 16129 11.2694  .00787402 
88 7144 9.3808  .011364 128 16384 11.3137  .00781250 
89 7921 9.4340 .011236 129 16641 11.3578  .00775194 
90 8100 — 9.4868 011111 130 16900 11.4018  .00769231 
91 8281 9.5394  .010989 131 17161 11.4455  .00763359 
92 8464 9.5917  .010870 132 17424 11.4891  .00757576 
93 8649 9.6437  .010753 133 17689 11.5326  .00751880 
94 8836 9.6954  .010038 134 17956 11.5758 .00746269 à i 
95 9025 9.7468  .010526 135 18225 11.6190  .00740741 = 
96 9216 9.7980  .010417 136 18496 11.6619  .00735294 
97 9409 9.8489  .010309 137 18769 11.7047  .00729927 
98 9604 9.8995  .010204 138 19044 11.7473  .00724638 
99 9801 9.9499  .010101 139 19321 11.7898  .00719424 
100 10000 10.0000 .010000 140 19600 11.8322  .00714286 
101 10201 10.0499  .00990099 141 19881 11.8743  .00709220 
102 10404 10.0995  .00980392 142 20164 11.9164  .00704225 
103 10609 10.1489  .00970874 143 20449 11.9583  .00699301 
104 10816 10.1980  .00961538 144 20736 12.0000 .00694444 
105 11025 10.2470  .00952381 145 21025 12.0416 .00689655 
106 11236 10.2956  .00943396 146 21316 12.0830  .00684932 
107 11449 10.3441  .00934579 147 21609 12.1244  .00680272 
108 11664 10.3923  .00925926 148 21904 12.1655 00675676 | 
109 11881 10.4403  .00917431 149 22201 12.2066  .00671141 4 
110 12100 10.4881  .00909091 150 22500 12.2474  .00666667 F g 
111 12321 10.5357  .00900901 151 22801 12.2882 .00662252 
112 12544 10.5830 .00892857 152 23104 12.3288  .00657895 | 
113 12769 10.6301 00884956 153 23400 12.3603  .00653595 
114 12996 10.6771  .00877193 154 23716 12.4097  .00649351 | 
115 13225 10.7238  .00869565 155 24025 12.4499  .006451061 | 
116 13456 10.7703  .00862069 156 24336 12.4900  .00641026 1 
117 13689 10.8167  .00854701 157 24649 12.5300  .00636943 I 
118 13924 10.8628  .00847458 158 24964 12.5608  .00632911 
119 14161 10.9087  .00810336 159 25281 12.6005  .00628931 | 
120 14400 10.9545  .00833333 160 25600 12.6491  .00625000 


——————————————————— 


* Portions of Table II have been reproduced from J. W. Dunlap and A. K, Kurtz- 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 
(1932), by permission of the authors and publishers. 
yd 


= -— 


cv 


8 


APPENDIX 


TABLE II. Table of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000*—Continued 


N Ne XN 1/N N N? XN 1/N 


161 25921 12.6886  .00621118 201 40401 14.1774  .00497512 
162 26244 12.7279  .00617284 202 40804 14.2127  .00495050 


163 12.7071 203 41209 14.2478  .00492611 
164 12.8062 . 204 41616 14.2829  .00490196 
165 27225 12.8452 200606001 205 42025 14.3178  .00187805 
166 27556 12.8841  .00602410 206 42436 .00485437 
167 27889 12.9228  .00598802 207 42819 00483092 
168 28224 12.9615  .00595238 208 43264 00480769 
169 28561 13.0000  .00591716 209 43681 -00478469 
170 28900 13.0884  .00588235 210 44100 .00470190 
171 29241 13.0767  .00581795 211 44521 .00173934 
172 29584 13.1149  .00581395 212 44944 -00471698 
173 29929 13.1529  .00578035 213 45369 .00469484 
174 302760 13.1909  .00574713 214 45796 .00467290 
175 30025 13.2288  .00571429 215 46225 .00465116 


176 30976 13.2005  .00568182 216 46656 14.6969 .00462963 
177 31329 13.3041  .00564972 217 47089 14.7309  .00460829 
178 31084 13.3417  .00561798 218 47524 14.7048  .004158716 
179 32041 13.3791  .00558659 219 47961 14.7986 .00456621 
180 32400 13.4164  .00555556 220 48400 14.8324  .00454545 


181 32761 13.4536  .00552486 221 48841 14.8061  .00452489 
182 33124 13.4907  .00549451 222 49284 14.8997  .00450450 
183 33489 13.5277  .00546448 228 49729 14.9332  .00448430 
184 338560 13.5047 .00543478 || 224 50176 14.9666  .004146429 
185 34225 13.0015  .00540541 225 50625 15.0000 .00444444 


180 34590 13.6382  .00537634 2260 51076 15.0333  .00442478 
187 34909 13.0748  .00534759 227 51529 15.0005  .00440529 
188 35344 13.7113  .00531915 228 51984 15.0997  .00438596 
189 35721 13.7477  .00529101 229 52441 15.1327  .00436081 
190 36100 13.7840  .00526316 230 52900 15.1058  .00431783 


191 36481 13.8203 .00523560 || 231 53361 13.1987 .00432900 
102 36864 13.8504 .00520833 | 232 53824 15.2315 00431034 
103 37249 13.8924 .00518135 | 233 54289 15.2643 00420185 
194 37636 13.9284 .00515464 | 234 54756 15.2971  .00427350 
105 38025 13.9042 .00512821 || 235 55225 15.3297  .00425532 


196 38416 14.0000  .00510204 236 55696 15.3623  .00423729 
197 38809 14.0357  .00507614 237 56169 15.3948  .00421941 
198 39204 14.0712  .00505051 238 56644 15.4272  .00420168 
199 39601 14.1067  .00502513 239 57121 15.4596  .00418410 
200 40000 14.1421  .00500000 240 57600 15.4919 .00416667 
eee 
* Portions of Table II have been reproduced from J. W. Dunlap and A. K. Kurtz. 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 
(1932), by permission of the authors and publishers. 


386 APPENDIX 


TABLE II. Table of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000*—Continued 


ee duce NE cci i i Ll 
N N? N 1/N N Ne yN 1/N s 
241 58081 .00414938 || 281 78961 16.7031  .00355872 
242 58564 .00413223 282 19524 10.7929  .00354610 
243 59049 {00411523 || 283 80089 16.8226  .00353357 
244 59536 .00409836 284 80656 16.8523  .00352113 


15.6205 
245 60025 15.6525 .00408163 || 285 81225 16.8819 00350877 


246 60516 15.6844 00406504 || 286 81796 16.9115  .00349650 
247 61009 15.7162  .00404858 || 287 82369 16.9411  .00348432 
248 61504 15.7480  .00403220 || 288 82944 16.9706  .00347222 
249 62001- 15.7797 .00401606 || 289 83521 17.0000  .00316021 
250 62500 15.8114 .00400000 | 290 84100 170294  .00344828 


251 63001 15.8430 00398406 || 291 84681 17.0587 00343643 . 
252 63504 15.8745  .003 292 85264 17.0880  .00342466 | 
253 64009 15.9060 — .003! 233 85849 17.1172 00341297 


254 15.9374 .003' 294 86436 17.1464  .00340136 
255 15.9687  .00392157 || 295 87025 17.1756  .00338983 
256 16.0000 .00390625 || 296 87616 17.2047  .00337838 
257 16.0312 .00389105 || 297 88209 17.2337 00336700 


258 66564 16.0624 .00387597 || 298 88804 17.2627 00335570 
259 67081 16.0935 .00386100 | 299 89401 17.2916 00334448 i 
260 67600 106.1245 .00384615 | 300 90000 17.3205 .00333333 ! 


261 68121 16.1555 .00383142 | 301 90601 17.3494  .00332226 | 
262 68644 16.1864 .00381679 || 302 91204 17.3781  .00331126 
263 69169 16.2173 .00380228 || 303 91809 17.4069  .00330033 
264 69696 10.2481 .00378788 | 304 92416 17.4356  .00328947 
205 70225 16.2788 .00377358 | 305 93025 17.4642  .00327869 


266 70756 16.3095 .00375940 || 306 93636 17.4929  .00320797 
267 71289 16.3401 .00374532 || 307 94249 17.5214  .00325733 
268 71824 16.3707  .003731: 308 94864 17.5499  .00324675 kh 
269 72361 106.4012 .0037 309 95481 17.5784  .00323625 T 
270 72900 10.4317 .00370370 || 310 96100 17.6068  .00322581 


271 73441 16.4621 .00369004 | 311 96721 17.6352  .00321513 
272 739084 16.4924 .00367647 || 312 97344 17.0635  .00320518 
273 74529 16.5227  .00360300 | 313 — 97969 17.6918 00319489 
274 75076 16.5529 00364961 | 314 98596 17.7200 .00318471 
275 75625 10.5831 .00363036 | 315 99225 177482  .00317460 


276 76176 106.6132  .00362319 316 99856 17.7764  .00316456 
277 76729 16.6433  .00361011 317 100489 17.8045  .00315457 
278 77284 10.6733  .00359712 318 101124 17.8326 .00314465 
279 77841 16.7033  .00358423 319 101761 17.8606  .00313480 
280 78400 16.7332  .00357143 320 102400 17.8885  .00312500 
* Portions of Table II have been reproduced from J. W. Dunlap and A. K. Kurtz. 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 
(1932), by permission of the authors and publishers. 


ri 
I 


LI 
^ APPENDIX 387 
TABLE II. Table of Squares, Square Roots, and Reciprocals 
^ of Numbers from 1 to 1,000*—Continued 


N No VN 1/N N N VN 1/N 


321 103041 17.9165  .00311520 || 361 130321 19.0000  .00277008 
322 103684 17.9444  .00310559 || 362 131044 19.0263  .00276243 
j 323 104329 17.9722 .00309598 || 363 131769 19.0520 .002754 
324 104976 18.0000  .00308642 || 304 182496 19.0758  .00274725 
325 105625 18.0278 .00307692 || 365 133225 19.1050  .00273973 
* 


326 106276 18.0555  .00306748 || 366 133956 19.1311  .00273224 
327 106929 18.0831 .00305810 | 367 134689 19.1572  .00272480 
328 107584 18.1108  .00304878 || 308 185424 19.1833  .00271739 
329 108241 18.1384 .00303951 || 369 136161 19.2094  .00271003 
330 108900 18.1659  .00303030 || 370 136900 19.2354  .00270270 


331 109561 18.1934 .00302115 || 371 137641 19.2614  .00269542 
332 110224 18.2209 .00301205 || 372 138384 19.2873  .00208817 
333 110889 18.2483  .00300300 || 373 139129 19.3132  .00268097 
x 334 111556 18.2757 .00299401 || 374 139876 19.3391  .00267380 
335 112225 18.3030 .00298507 || 375 140625 19.3049  .00206607 


396 112896 18.3308 .00297619 || 376 141376 19.3907  .00205957 
337 113569 18.3570 .00296736 | 377 142129 19.4100  .00265252 
338 114244 18.3848  .00205858 || 378 142884 19.4422 00264550 
339 114921 184120 00294985 | 379 143641 19.4679 00263852 
340 115600 18.4391 .00294118 || 380 144400 19.4936  .00263158 


341 116281 18.4062 .00293255 || 381 145161 19.5102 00262467 
342 116964 18.4932 .00292398 || 382 145924 19.5448  .00261780 
5343 117649 18.5203 .00291545 || 383 146689 19.5704  .00261007 
344 118336 18.5472 00200008 | 384 147456 19.5959 100260417 
345 119025 18.5742 00289855 || 385 148225 19.6214  .00259740 


346 119716 18.0011 .00289017 | 386 148996 19.6469  .00259007 
347 120409 18.6279 .00288184 || 387 149700 19.6723  .00258398 
348 121101 18.0048 .00287356 | 388 150544 19.6977 00257732 
349 121801 18.6815 .00286533 || 389 151321 19.7231  .00257069 
350 122500 18.7083 .00285714 | 390 152100 19.7484  .00956410 


351 193201 18.7350 .00284900 | 301 152881 19.7737  .00255754 
352 123904 18.7617 .00284091 | 392 153004 19.7990 00255102 
353 124609 18.7883 00283280 | 393 154449 19.8242 100254453 
381 125316 18.8149 .00282486 | 394 155236 19.8494 00253807 
385 126025 18.8414 00281690 | 395 156025 19.8746  .00253105 


126736 18.8680 .00280899 || 396 156816 19.8997  .00252525 

Een 127449 18.8044 .00280112 || 397 157609 19.9249 00251889 

358 128164 18.9209  .00279330 398 158404 19.9199 00251256 

359 128881 18.9473  .00278552 | 399 159201 19.9750  .00250627 

360 129800 18.9737  .00277778 || 400 160000 20.0000  .00250000 
—ÓÁ———————————————————— 
* Portions of Table II have been reproduced from J. W. Dunlap and A. K, Kurte. 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 

(1932), by permission of the authors and publishers. 


388 


APPENDIX 


TABLE II. 


of Numbers from 1 to 1,000*—Continued 


Table of Squares, Square Roots, and Reciprocals 


N N? MN 1/N N N? VN 1/N 

401 160801 20.0250 .00249377 || 441 194481 21.0000 .00226757 
402 161604 20.0499 00248756 || 442 195364 21.0238 00226244 
403 162409 20.0749 .00248139 || 443 196249 21.0476 00225734 
404 163216 20.0998 .00247525 444 197136 21.0713  .00225225 
405 164025 20.1246 .00246914 445 198025 21.0950 .00224719 
406 164836 20.1494  .002416305 || 446 198916 21.1187 .00224215 
407 165649 20.1742  .00245700 || 417 199809 21.1424 {00223714 
408 166464 20.1990 .00245098 || 448 200704 21.1660 00223214 
409 107281 20.2237 .00244499 | 449 201601 21.1896  .00222717 
410 168100 20.2485 .00243902 || 450 202500 21.2132 00222222 
411 168921 20.2731 .00243309 | 451 203401 21.2368 .00221729 
412 169744 20.2978  .00242718 || 452 204304 21.2603  :00221239 
413 170509 20.3224 .00242131 || 453 205209 21.2838  .00220751 
414 171396 20.3470 .00241546 || 454 206116 21.3073  .00220204 
415 172225 20.3715 .00240964 || 455 207025 21.3307  .00219730 
416 173056 20.3961 .00240385 | 456 207936 21.3542  .00210298 
417 173889 20.1206 .00239808 | 457 208849 21.3776  .00218818 
418 174724 20.4450 .00239234 || 458 209764 21.4009  .00218311 
419 175561 20.4605 .00238663 || 459 210081 21.4243  ,00217805 
420 176400 20.4939 .00238095 || 460 211600 21.4476  .00217391 
421 177241 20.5183 .00237530 || 461 212521 21.4709  .00216920 
422 178084 20.5426 .00236967 || 462 213444 21.4042  .00216450 
423 178929 20.5670 .00236407 || 463 214369 21.5174 .00215983 
424 179776 20.5013 .00235849 | 464 215296 21.5407  .00215517 
425 180625 20.6155 .00235294 || 465 216225 21.5639  .00215054 
426 181476 20.6398 .00234742 || 466 217156 21.5870  .00214592 
427 182329 20.6640 .00234192 || 467 218089 21.6102 .00214133 
428 183184 20.6882 .00233045 || 468 219024 21.6333  .00213675 
429 184041 20.7123  .00233100 469 219961 21.6564  .00213220 
430 184900 20.7364 .00232558 || 470 220900 21.6795  .00212706 
431 185761 20.7605  .00232019 | 471 221841 21.7025 .00212314 
432 186624 20.7846 .00231481 || 472 222784 21.7256  .00211864 
433 187489 20.8087 .00230947 | 473 223729 21.7486  .00211416 
434 188350 20.8327 .00230415 || 474 224676 21.7715  .00210970 
435 189225 20.8567 .00229885 || 475 225625 21.7945  .00210526 
436 190096 20.8806 .00229358 || 476 226576 21.8174  .00210084 
437 190969 20.9045  .00228833 || 477 227529 21.8403  .00209614 
438 191844 20.0284 .00228311 | 478 228484 21.8632  .00200205 
439 192721 20.0523 .00227790 || 479 220441 21.8861  .00208768 
440 193600 20.0762 .00227273 || 480 230400 21.9089  .00208333 


——————————————————————e 
* Portions of Table II have been reproduced from J. W. Dunlap and A. K. Kurtz 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 


(1932), by permission of the authors and publishers. 


s 


APPENDIX 389 
TABLE II. Table of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000*—Continued 
N M VN 1/N N VN 1/N 
481 231361 21.9317 .00207900 || 521 271441 22.8254  .00191939 
482 232324 21.9545  .00207469 || 522 272 22.8473  .00191571 
483 233289 21.9773  .00207039 || 523 22.8692  .00191205 
484 234256 22.0000 .00206612 || 524 22.8910 .00190840 
485 235225 22.0227 .00206186 525 22.0129  .00190476 
486 236196 22.0454  .00205761 526 276676 22.0317  .00190114 
487 237169 22.0081  .00205339 || 527 277729 22.9565  .00189753 
488 238144 22.0907 .00204918 || 528 278784: 22.0788  .00189394 
489 239121 22.1133 .00204499 || 529 279841 23.0000  .00189036 
490 240100 22.1359  .00204082 || 530 280900 23.0217  .00188679 
491 241081 22.1585 .00203666 || 531 281961 23.0434  .00188321 
492 242064 22.1811  .00203252 || 532 283024 23.0051  .00187970 
493 243049 22.2036  .00202840 || 533 284089 23.0868  .00187617 
494 244036 22.2261  .00202429 || 534 285156 23.1084  .00187266 
495 245025 22.2486  .00202020 || 535 286225 23.1301  .00186916 
496 240016 22.2711 .00201613 || 536 287296 23.1517 .00186567 
497 247009 22.2935 .00201207 || 537 288369 23.1733  .00186220 
498 248004 22.3159 .00200803 || 538 289444 23.1948  .00185874 
499 249001 22.3383 .00200401 || 539 290521 23.2164 .00185529 
500 250000 22.3607 .00200000 || 540 291600 23.2379  .00185185 
501 251001 22.3830 .00199601 || 541 292681 23.2594  .00184843 
502 252004 22.4054 .00199203 || 542 293764 23.2809  .00184502 
503 253009 22.4277 .00198807 || 543 294849 23.3024 .00184162 
504 254016 22.4499  .00198413 544 295936 23.3238  .00183824 
505 255025 22.4722 .00198020 || 545 297025 23.3452  .00183486 
506 256036 22.4944 .00197628 || 546 298116 23.3666  .00183150 
507 257049 22.5167 .00197239 | 547 299209 23.3880 00182815 
508 258064 22.5389 .00196850 || 548 300304 23.4094 .00182482 
509 259081 22.5610 .00196464 || 549 301401 23.4307  .00182149 
510 260100 22.5832 .00196078 || 550 302500 23.4521  .00181818 
511 261121 22.6053 .00195695 || 551 303601 23.4734  .00181488 
512 262144 22.6274 .00195312 || 552 304704 23.4947 00181159 
513 263169 22.6495  .00194932 || 553 305809 23.5160 00180832 
514 264196 22.6716 .00194553 | 554 306916 23.5372 00180505 
515 265225 22.6936 .00194175 || 555 308025 23.5584 00180180 
516 266256 22.7156 .00193798 || 556 309136 23.5797  .00179856 
B 267289 22.7376 .00193424 || 557 310249 23.6008  :00179533 
518 268324 22.7596 .00193050 | 558 311364 23.6220 00179211 
519 2690301 22.7816 .00192678 | 559 312481 23.6432 00178891 
520 270400 22.8035 .00192308 |} 560 313600 23.0643  :00178571 


re 


* Portions of Table II have been reproduced from J. W. Dunlap and A. K. Kurtz. 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 
(1932), by permission of the authors and publishers. 


390 APPENDIX 


TABLE II. Table of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000*—Continued 


N N? MN 1/N N N? VN 1/N 


561 314721 23.6854 .00178253 | 601 361201 24.5153  .00166389 
562 315844 23.7065 .00177936 || 602 362404 24.5357 .00166113 
563 316969 23.7276 .00177620 || 603 363609 24.5561  .00165837 
564 318096 23.7487  .00177305 || 604 364816 24.5764  .00165563 
565 319225 23.7007 .00176991 || 605 366025 24.5967  .00165289 


566 320356 23.7908 .00176678 || 606 367236 24.6171 .00165017 
567 321480 23.8118  .00176367 || 607 368449 24.6374  .00164745 
568 322624 23.8328 .00176056 || 608 369664 24.0577  .00164474 
569 323761 23.8537  .00175747 || 609 370881 24.6770  .00164204 
510 324900 23.8747  .00175439 || 610 372100 24.6982  .00103934 


571 326041 23.8956  .00175131 || 611 373321 24.7184 .00168666 
572 327184 23.9165 .00174825 || 612 374544 24.7380  .00163399 
573 328329 23.9374 .00174520 || 613 375769 24.7588  .00163132 
514 329476 23.9583 .00174216 || 614 376996 24.7790  .00162806 
515 330625 23.9702  .00173913 || 615 378225 24.7992  .00162602 


576 331776 24.0000 .00173611 | 616 379456 24.8193  .00162338 
577 332929 24.0208 .00173310 | 617 380689 24.8395  .00162075 
578 334084 24.0416 .00173010 | 618 381924 24.8596  :00161812 
579 335241 24.0024 00172712 | 619 383161 24.8797  .00161551 
580 336400 24.0832 00172414 || 620 384400 24.8098 .00161290 


581 337561 24.1039  .00172117 | 621 385641 24.9199  .00161031 
582 338724 24.1247 .00171821 || 622 386884 24.0309  .00160772 
583 339889 24.1454 .00171527 || 623 388129 24.9000  .00160514 
584 341056 24.1661  .00171233 || 624 389376 24.9800  .00160256 
585 342225 24.1868 .00170940 | 625 390625 25.0000  .00160000 


586 343306 24.2074  .00170648 || 626 391876 .00159744 

587 344569 24.2281  .00170358 || 627 393129 00159490 

588 345744 24.2487  .00170068 || 628 394384 00159236 

589 346921 24.2603 .00169779 | 629 395041 00158983 E 
590 348100 24.2899  .00169492 || 630 396900 25.0998  .00158730 "" 


591 349281 24.3105 .00169205 || 631 398161 25.1197  .00158479 
592 350464 24.3311  .00168919 || 632 399424 25.1396  .00158228 
593 351649 24.3516  .001608634 || 633 400689 25.1505  .00157978 
594 352836 24.3721 .00168350 || 634 401956 25.1794  .00157729 
595 354025 24.3926 .00168067 || 635 403225 25.1992  .00157480 


596 355216 24.4131  .00167785 || 636 404496 25.2190  .00157233 
597 356409 24.4336  .00167504 || 637 405769 25.2389  .00156986 
598 357604 24.1540 .00167224 || 638 407044 25.2587  .00150740 
509 358801 24.4745  .00106945 || 639 408321 25.2784 .00156495 
600 360000 24.4949  .00166667 || 640 409600 25.2982 .00156250 


——————————————————————O 


* Portions of Table II have been reproduced from J. W. Dunlap and A. K. Kurtz. , 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 
(1932), by permission of the authors and publishers. 


APPENDIX 391 


TABLE II. Table of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000*—Continued 
~ — = 
N N JN 1/N N N? WN 1/N 


641 410881 25.3180 . 00156006 || 681 463761 26.0960 .00146843 
642 412164 25.3377 .00155763 || 682 465124 26.1151  .00146628 
643 413449 25.3574 .00155521 || 683 466489 26.1343 — .00146413 
644 414736 25.8772 .00155280 | 684 467856 26.1534 .00146199 
645 416025 25.3969  .00155039 | 685 469225 26.1725  .00145985 


646 417316 25.4165  .00154799 | 686 470596 26.1916 — .00145773 
647 418609 25.4362 .00154560 | 687 471969 26.2107 —.00145560 
648 419904 25.4558 00154321 | 688 473344 26.2298 .00145349 
649 421901 25.4755 00154083 || 689 474721 26.2488 00145138 
650 422500 25.4951 .00153846 | 690 476100 26.2679  .00144928 


651 423801 25.5147 .00153610 | 691 477481 20.2860 .00144718 
652 425104 25.5343  .00153374 || 692 478864 26.3059 —.00144509 
653 426409 25.5539  .00153139 || 693 480249 20.3249 — .00144300 
S 654 427710 25.5734  .00152905 || 694 481636 20.3439 .00144092 
EU 655 429025 25.5930 .00152672 || 695 483025 26.3629 —.00143885 


656 430330 25.6125 .00152439 || 696 484416 26.3818 .00143678 
657 431649 25.6320  .00152207 || 697 485809 26.4008  .00143472 
658 432904 25.6515  .00151976 698 487204 206.4197  .00143206 
659 434281 25.0710  .0015 1745 || 699 488601 26.4386  .00143062 
660 435600 25.0905 .00151515 || 700 490000 26.4575  .00142857 


661 436921 25.7099 00151286 || 701 491401 26.4764  .00142053 
662 438244 25.7294 00151057 || 702 492804 26.4953  .00142450 
663 439569 25.7488 .00150830 || 703 494209 26.5141  .00142248 
664 440890 25.7682 .00150602 || 704 495616 20.5330  .00142045 
665 442225 25.7876 .00150376 || 705 497025 26.5518  .00141844 


oce 443556 25.8070 .00150150 | 706 498436 26.5707 00141643 
GS  4448S9 25.8263 00149925 || 707 499849 26.5895 .00141443 
GOT 146224 25.8457 .00149701 | TOS 501264 26.6083  .00141243 
QS 417501 25.8050 00149177 || 709 502681 26.6271 .00141044 
GO) 448900 25.8844  .00140251 | 710 504100 26.6458 —.00140845 


671 450241 25.9037 .00149031 || 711 505521 26.6646 .00140647 
672 451584 25.0230  .00148810 | 712 506944 20.0833  .00140449 
673 452929 25.9422 00148588 || 713 508369 26.7021  .00140252 
674 454276 25.9615 00148368 | 714 509796 26.7208  .00140056 
675 455625 25.9808 00148148 || 715 511225 26.7395  .00139860 


; 456076 26.0000 00147929 | 716 512656 00139663 
676 288329 260192 00147710 | 717 514089 30 00139470 
G77 4684 20.0384 00147493 | 718 515524 20.7055 00139276 
679 461041 26.0576 .00147275 719 516961 26.8142  .00139082 
S70 402000 26.0708 100147059 || 720 518400 26:8328  :00138889 


"portions of Table II have been reproduced from J. W. Dunlap and A. K. Kurtz. 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 
(1932), by permission of the authors and publishers. 


392 


APPENDIX 


TABLE II. Table of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000*—Continued 


N N? MN 1/N N N? MN 1/N 
721 519841 206.8514  .00138696 || 761 579121 27.5862  .00131406 
722 521284 206.8701  .00138504 762 580644 27.6043  .00131234 
1723 522729 26.8887  .00138313 763 582169 27.6225  .00131062 
724 524176 26.9072  .00138122 764 583696 27.6405  .00130890 
725 525625 26.9258  .00137931 765 585225 27.6586  .00130719 
726 527076 26.9444  .00137741 766 586756 27.6767 .00130548 
727 528529 26.9629  .00137552 767 588289 27.6948  .00130378 
728 529984 26.9815  .00137303 768 589824 27.7128  .00130208 
729 531441 27.0000 .00137174 || 769 591361 27.7308  .00130039 
730 532900 27.0185 .00136986 || 770 592900 27.7489  .00129870 
731 534861 27.0370 .00136799 || 771 594441 27.7669  .00129702 
732 535824 27.0555 .00136612 772 595984 27.7849 .00129534 
733 537289 27.0740 .00136426 773 597529 27.8029  .00129366 
734 538756 27.0924  .00136240 774 599076 27.8209 .00129199 
735 540225 27.1109 .00136054 || 775 600625 27.8388  .00129032 
736 541696 27.1293  .00135870 776 602176 27.8568  .00128866 
737 543169 27.1477  .00135685 777 603729 27.8747  .00128700 
738 544644 27.1662  .00135501 778 605284 27.8927  .00128535 
739 546121 27.1846  .00135318 || 779 606841 27.9106  .00128370 
740 547600 27.2029 .00135135 || 780 608400 27.9285  .00128205 
741 549081 27.2213  .00134953 || 781 609961 27.9464  .00128041 
742 550564 27.2397  .00134771 782 611524 27.9643  .00127877 
743 552049 27.2580 .00134590 | 783 613089 27.9821  .00127714 
744 553536 27.2764  .00134409 784 614656 28.0000  .00127551 
745 555025 27.2947  .00134228 785 616225 28.0179  .00127389 
746 556516 27.3130 .00134048 | 786 617796 28.0357 .00127226 
747 558009 27.3313  .00133869 787 619369 28.0535  .00127065 
748 559504 27.3496  .00133690 788 620944 28.0713 .00126904 
749 561001 27.3679  .00133511 789 622521 28.0891 .00126743 
750 562500 27.3861  .00133333 790 624100 28.1069  .00126582 
751 564001 27.4044 .00133156 791 625681 28.1247  .00126422 
752 565504 27.4226  .00132979 792 627264 28.1425 .00126263 
753 567009 27.4408 .00132802 | 793 628849 28.1603  .00126103 
754 568516 27.4591 .00132626 794 630436 28.1780  .00125945 
755 570025 27.4773  .00132450 795 682025 28.1957  .00125786 
756 571536 27.4955  .00132275 796 633616 28.2135 .00125628 
757 573049 27.5136 .00132100 797 635209 28.2312 .00125471 
758 574564 27.5318 — .00131926 798 636804 28.2489  .00125313 
759 576081 27.5500  .00131752 799 638401 28.2600  .00125156 
760 577600 27.5681  .00131579 800 640000 28.2843  .00125000 


* Portions of Table II have been reproduced from J. W. Dunlap and A. K. Kurtz. 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 


(1932), by permission of the authors and publishers. 


APPENDIX 393 


TABLE II. T'able of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000*—Continued 


f "EV; VN N © VJ UN 


801 041601 28.3019 .00124844 | 841 707281 29.0000 .00118906 
802 643204 28.3196  .00124688 || 842 708964 29.0172  .00118765 
803 644809 28.3373 .00124533 || 843 710649 29.0345  .00118624 
804 646416 28.3549  .00124378 || 844 712336 29.0517  .00118483 
805 648025 28.3725  .00124224 || 845 714025 29.0689  .00118343 


806 649636 28.3901 .00124069 | 846 715716 29.0861  .00118203 
807 651249 28.4077 .00123916 || S47 717409 29.1033  .00118064 
808 652864 28.4253  .00123762 || 848 719104 29.1204 .00117925 
809 654481 28.4429  .00123609 || 849 720801 29.1376  .00117786 
810 656100 28.4605  .00123457 || S50 722500 29.1548  .00117647 


811 657721 28.4781 .00123305 | 851 724201 29.1719 .00117509 
812 659844 28.4956  .00123153 || 852 725904 29.1890  .00117371 
813 660969 28.5132  .00123001 || 853 727609 29.2062  .00117233 
814 662596 28.5307 .00122850 | S54 729316 29.2233  .00117096 
664225 28.5482  .00122699 || 855 731025 29.2404  .00116959 


Li 


816 665856 28.5657 .00122549 || 856 732736 29.2575 .00116822 
817 667489 28.5832 .00122399 || 857 734449 29.2746  .00116686 
818 669124 28.6007 .00122249 | 858 736164 29.2916 .00116550 
819 670761 28.6182 .00122100 | 859 737881 29.3087 .00116414 
820 672400 28.6356 .00121951 || 860 739600 29.3258 .00116279 


821 674041 28.6531 .00121803 | 861 741321 29.3428 .00116144 
822 675684 28.6705 00121655 | 862 743044 29.3598 00116009 
823 677329 28.6880 00121507 | 863 744769 29.3769 100115875 
824 678976 28.7054 .00121359 | 864 746496 29.3939 100115741 
825 680625 28.7228 .00121212 | S65 748225 29.4109 100115607 


926 682276 28.7402 .00121065 || S66 749956 29.4279 .00115473 
827 683929 28.7576 00120919 || 867 751689 29.4449 00115340 
828 685584 28.7750 .00120773 || S68 753424 29.4618 100115207 
829 687241 28.7924 .00120627 || 869 755161 29.4788 00115075 
830 688900 28.8097 .00120482 || S70 756900 29.4958 [00114943 


931 690561 28.8271 .00120337 | 871 758641 29.5127 .00114811 
832 692224 28.8444 00120192 | 872 760384 29.5296 00114679 
833 693889 28.8617 .00120048 || 873 762129 29.5466 00114548 
834 695556 28.8791 .00119904 | 874 763876 29.5635 ‘00114416 
835 697225 28.8964 .00119760 | 875 765625 29.5804 100114286 


98896 28.9137 .00119617 || 876 767376 29.5973 .00114155 
209 SEEN 28.9310 .00119474 || 877 769129 29.6142 00114025 
838 702244 28.9482 00119332 || 878 770884 29.6311 00113895 
839 703921 28.9655 .00119190 | 879 772641 29.6479 (00113766 
840 705600 28.9828 .00119048 | 880 774400 29.6648 {00113636 


* Portions of Table II have been reproduced from J. W. Dunlap and A. K. Kurtz. 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 
(1932), by permission of the authors and publishers. 


APPENDIX 


TABLE II. T'able of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000*—Continued 


N MN VN uN |y x VN 1/N 
881 776161 29.6816 .00113307 | 921 848241 30.3480  .00108578 
882 777924 29.6985 .00113379 || 922 850084 30.3645  .00108460 
883 719689 29.7153 .00113250 || 923 851929 30.3809 00108342 
884 781456 29.7321  .00113122 924 853776 30.3974  .00108225 
885 783225 29.7489  .00112994 925 855625 30.4138  .00108108 
886 784996 29.7658 .00112867 | 926 857476 30.4302  .00107991 
887 786709 29.7825 100112740 || 927 859329 30.4467 00107875 
S88 788544 29.7993 00112613 || 928 861184 30.4631 100107759 
889 790321 29.8161 .00112486 929 863041 30.4795  .00107643 
890 792100 29.8329 .00112360 | 930 864900 30.4959 00107527 
891 793881 29.8496 .00112233 | 931 866761 30.5193 00107411 
892 795664 29.8664 .00112108 || 932 868624 30.5287 100107296 
893 797449 29.8831  .00111982 933 870489 30.5450  .00107181 
894 799236 29.8908 .00111857 934 872356 30.5614 .00107066 
895 801025 29.9166 .00111732 | 935 874225 30.5778 00106952 
896 802816 29.9333 .00111607 || 936 876096 30.5941  .00106838 
897 804609 29.9500 00111483 || 937 877969 30.6105  :00100724 
898 806404 29.9000 .00111359 || 938 879844 30.0268 00106610 
899 808201 29.9833 00111235 || 939 881721 30.6431 100106496 
900 810000 30.0000 .00111111 || 940 883600 30.0594  .00106383 
901 811801 30.0107 .00110988 || 941 885481 30.0757  .00106270 
902 813604 20.0333 .00110865 || 942 887364 30.0920  .00106157 
903 815409 30.0500 .00110742 || 943 889219 30.7083  .00100045 . 
904 817216 30.0666 .00110619 || 944 891136 30.7246  .00105932 
905 819025 30.0832 .00110407 || 945 893025 30.7409  .00105820 
906 820836 30.0008 .00110375 || 946 894916 30.7571  .00105708 
907 822649 30.1161 .00110254 || 947 896809 30.7734  .00105597 
908 824464 30.1330 00110132 || 948 898704 30.7806 

909 826281 30.1496 00110011 || 949 900601 30.8058 | 

910 828100 30.1662 .00109890 || 950 902500 30.8221  .00105263 
911 829921 30.1828 .00109769 || 951 904401 30.8383  .00105152 
912 831744 30.1993 .00100649 || 952 906304 30.8545 00105042 
913 833569 30.2159 .00109529 || 953 908209 30.8707 00104932 
914 835396 30.2324 00109409 || 954 910116 30.8869 00101822 
915 837225 30.2490 .00109290 || 955 912025 30.9031 100104712 
916 839056 30.2655 .00109170 || 956 913936 30.9192 00101003 
917 840889 30.2820 .00109051 || 957 915849 30.9354 00101493 
918 842724 30.2085 00108032 | 958 917761 30.9510 00101381 
919 844561 30.3150 .00108814 || 959 919681 30.9677 (00104275 
920 846400 30.3315  .00108696 960 921600 30.9839  .00104167 


* Portions of Table II have been reproduced from J. W. Dunlap and A. K. Kurtz. 
Handbook of Statistical Nomographs, Tables, and Formulas, 


(1932), by permission of the authors and publishers, 


World Book Company, New York 


APPENDIX 395 


TABLE II. Table of Squares, Square Roots, and Reciprocals 
of Numbers from 1 to 1,000*—Concluded 


p N M MN 1/N N w VN 1/N 


961 923521 31.0000  .00104058 981 962361 31.3209  .00101937 
962 925444 31.0161  .00103950 982 964324 31.3369  .00101833 
963 927369 31.0322  .00103842 983 966289 31.3528  .00101729 
964 929296 31.0483  .00103731 984 968256 31.3688  .00101626 
965 931225 31.0644  .00103627 985 970225 31.8847  .00101523 


966 933156 31.0805  .00103520 986 972196 31.4006 .00101420 
967 935089 31.0966  .00103413 987 974169 31.4166  .00101317 
968 937024 31.1127  .00103306 988 976144 31.4825  .00101215 
969 938961 31.1288  .00103199 989 978121 31.4484  .00101112 
970 940900 31.1448  .00103093 990 980100 31.4643  .00101010 


971 942841 31.1609  .00102987 991 982081 31.4802  .00100908 
972 944784 31.1769  .00102881 992 984064 31.4960  .00100806 
973 946729 31.1929  .00102775 993 986049 31.5119  .00100705 
974 948676 31.2090  .00102669 994 988036 31.5278  .00100604 
975 950625 31.2250  .00102564 995 990025 31.5436  .00100503 


976 952576 31.2410  .00102459 996 992016 31.5595  .00100402 
977 954520 31.2570  .00102354 997 994009 31.5758  .00100301 
978 956484 31.2730  .00102249 998 996004 31.5911  .00100200 
979 958441 31.2890  .00102145 999 998001 31.6070  .00100100 
980 960400 31.3050 .00102041 || 1000 1000000 31.6228  .00100000 


a 


* Portions of Table II have been reproduced from J. W. Dunlap and A, K. Kurtz. 
Handbook of Statistical Nomographs, Tables, and Formulas, World Book Company, New York 


(1932), by permission of the authors and publishers. 


396 APPENDIX 


TABLE IIT. Areas and Ordinates of the Normal Curve in Terms of z/o 


a) Q) (3) (4) (5) 
z A B C y 
STANDARD AREA FROM arer IN AREA IN ORDINATE 
T z ARGER SMALLER z 
Score (3) MEAN TO C | Portion PORTION Apes 
0.00 .0000 .5000 .5000 .3989 
0.01 :0040 -5040 :4960 .3989 
0.02 .0080 -5080 4920 .3989 
0.03 .0120 .5120 4880 .3988 
0.04 .0160 .5160 4840 .3986 
0.05 .0199 .5199 4801 3984. 
0.06 10239 15239 ‘4761 13982 
0.07 10279 15279 14721 73980 EE 
0.08 :0319 15319 ‘4681 13977 
0.09 10359 .5359 -4641 13973 
0.10 .0398 .5398 4602 .3970 
0.11 10438 5438 A562 13965 
0.12 0478 5478 4522 .3961 
0.13 0517 .5517 4483 .3956 
0.14 .0557 5557 4443 8951 
0.15 .0596 .5596 4404. 3945 
0.16 .0636 .5636 4364 -3939 
0.17 .0675 .5075 4325 .3932 | 
0.18 0714 5714 4286 -3925 
0.19 .0753 5758 4247 23918 | 
0.20 .0793 .5793 4207 3910 yes 
0.21 .0832 .5832 41068 .3902 m 
0.22 .0871 5871 -4129 3804 À 
0.23 0910 -5910 .4090 3885 
0.24 10948 15948 14052 "3876 | 
0.25 .0987 .5087 4013 3807 
0:26 :1026 16026 :3974 13857 
0.27 -1064 6064 .3936 3847 
o2 .1103 .6103 3897 .3836 
. 1141 6141 3859 13825 
0.30 1179 6179 
0.31 11217 6217 3783 $02 
0.32 11255 16255 33745 :3790 
033 11293 16293 ‘3707 13778 
. 1331 6331 -3669 .9165 


—_—)]} A SS 


APPENDIX 397 


rape III. Areas and Ordinates of the Normal Curve in Terms of x/a—Continued 


zu 
a) (2) (3) (4) (5) 
z A B [^] y 
STANDARD AREA FROM E IN AA IN ORDINATE 
r Er ARGER MALLER T 
Score E) Mean TO T | Portion PORTION em 
0.35 .1368 .6368 .3632 .9752 
0.36 .1406 .6406 .3594 .9739 
| 0.37 .1443 .6443 +8557 3725 
0.38 .1480 .6480 .9520 .3712 
| 0.39 517 6517 +3483 3697 
0.40 1554 .6554 .3446 .3683 
0.41 1591 .6591 .3409 .3668 
0.42 .1628 .6628 .3372 .3653 
0.43 .1664 .6664 .3336 .3637 
| 0.44 .1700 .6700 .3300 .3621 
| 3605 
0.45 .1736 .6736 .3264 
0.46 1772 .6772 .3228 .9589 
| 0.47 .1808 .6808 .3192 .3572 
| 0.48 1844 .6844 .3156 +3555 
0.49 1879 .6879 .3121 .3538 
0.50 .1915 .6915 -3085 .3521 
0.51 .1950 .6950 3050 +3503 
0.52 .1985 .6985 .3015 +3485 
0.53 .2019 .7019 -2981 3467 
0.54 2054 7054 .2946 3448 
0.55 .2088 -7088 2912 .3429 
iy 0.56 12123 17123 2877 -3410 
F 0.57 .2157 4157 -2843 .3391 
0.58 .2190 .7190 -2810 3372 
0.59 .2224 7224 2716 .3352 
0.60 .2257 7251 .2743 £8332 
0.61 -2291 7291 .2709 3312 
0.62 .2324 7324 .2676 .8292 
0.63 .2357 4257 .2643 .3271 
0.64 .2389 .7389 .2611 .83251 
0.65 .2422 .7422 2578 .3230 
0.66 .2454 «7454 .2546 .3209 
0.67 "2486 -7486 2514 733187 
0.68 .2517 517 .2483 .3166 
0.69 .2549 .7549 .2451 .3144 
D 


398 APPENDIX 


TABLE III. Areas and Ordinates of the Normal Curve in Terms of x/o—Continued 


= 

qa) (2) @) (4) (5) 
z A B C y 
STANDARD AREA FROM AREA IN AREA IN ORDINATE 
z c LARGER SMALLER x 

Score (2) Mean To 7 Portion PORTION ag 
0.70 -2580 -7580 .2420 .3123 
0.71 .2611 7611 .2389 .3101 
0.72 .2642 -1642 .2358 -3079 
0.73 .2073 4673 .2327 .3056 
0.74 2704 41704 .2296 8034 
0.75 .2734 A134 .2266 .3011 
0.76 -2764 7764 .2236 .2989 
0.77 .2794 1194 -2206 .2966 
0.78 .2823 .1823 2177 .2943 
0.79 .2852 7852 .2148 2920 
0.80 -2881 .7881 .2119 .2897 
0.81 2910 .7910 .2090 .2874 
0.82 .2939 -7939 .2061 .2850 
0.83 .2967 1967 .2033 .2827 
0.84 2995 «7995 .2005 .2808 
0.85 .3023 .8023 1977 .2780 
0.86 -3051 .8051 .1949 .2756 
0.87 -3078 .8078 .1922 .2732 
0.88 -3106 .8106 .1894 -2709 
0.89 .3133 .8133 1867 .2685 
0.90 -3159 8159 1841 2661 
0.91 .3186 .8186 -1814 .2637 
0.92 .3212 .8212 1788 .2613 
0.93 .3238 .8238 1762 .2589 
0.94 +3264 8264 1736 .2565 
0.95 -3289 .8289 A711 .2541 
0.96 .3315 .8315 .1685 .2516 
0.97 .3340 -8340 .1660 .2492 
0.98 3365 8365 1635 .2468 
0.99 .3389 .8389 1611 2444 
1.00 .3413 .8413 .1587 -2420 
1.01 .3438 .8438 .1562 -2396 
1.02 -3461 .8461 .1539 .2371 
1.03 +3485 8485 1515 -2347 
1.04 -3508 .8508 .1492 .2323 


APPENDIX 399 


TABLE III. Areas and Ordinates of the Normal Curve in Terms of x /e—Continued 


a) (2) (8) (4) (5) 
z A B Ç y 
SrANDARD AREA FROM AREA IN AREA IN ORDINATE 
s g M x LARGER SMALLER z 
SOORB MBAN' TO ^| PORTION Portion ara 
1.05 .3531 .S531 .1469 .2299 
1.06 .9551 .8554 .1446 .2275 
1.07 -3577 .8577 .1423 .2251 
1.08 .3599 .8599 .1401 .2227 
1.09 .3621 .8621 .1379 .2203 
1.10 .3643 .8643 .1357 .2179 
1.11 .3665 .8665 .1335 .2155 
1.12 .3686 -8686 .1314 .2131 
1.13 .3708 .8708 .1292 .2107 
1.14 .3729 .8729 1271 .2083 
1.15 .9749 .8749 .1251 .2059 
1.16 -3770 -8770 .1230 .2036 
4.17 .3790 .8790 .1210 .2012 
| 1.18 .3810 .8810 .1190 .1989 
1.19 .8830 .8830 A170 .1965 
1.20 3849 .8849 1151 .1942 
1.21 +3869 -8869 1131 .1919 
1.22 73888 18888 1112 "1895 
1.23 13907 18907 71093 "1872 
1.24 23925 .8925 -1075 .1849 
| 135 .3944 .8944 1056 1826 
> 1:926 73962 18962 11038 "1804 
1.27 13980 78980 11020 “1781 
1.28 .3997 .8997 .1003 .1758 
1.29 A015 9015 .0985 .1736 
1.30 .4032 9032 .0968 .1714 
131 14049 19049 10951 "1691 
1.32 14066 ‘9066 10934. “1669 
133 14082 :9082 20918 11647 
1.34 .4099 .9099 -0901 .1626 
a A115 .9115 .0885 .160. 
Qu 14131 29131 “0869 cree 
T.37 .4147 .9147 .0853 .1561 
133 44162 19162 ‘0838 "1539 
1.39 A177 ‘9177 10823 "1518 


400 APPENDIX 


TABLE III. Areas and Ordinates of the Normal Curve in Terms of z /a—Continued 


Q @) @) ) ©) 
z A B Cc y 
STANDARD AREA FROM AREA IN AREA IN ORDINATE 

‘x z LARGER SMALLER E 
Score (3) MEAN To 7 Portion PORTION a= 
1.40 .4192 .9192 .0808 .1497 
141 .4207 -9207 0793 1476 
1.42 4222 .9222 -0778 .1456 
1.43 .4236 9236 .0764 .1435 
1.44 .4251 .9251 .0749 1415 
1.45 .4265 9265 .0735 .1394 
1.46 .4279 .9279 .0721 .1374 
1.47 .4292 .9292 -0708 1354 
1.48 .4306 .9306 .0694 .1334 
1.49 .4319 .9319 .0681 1315 i 
1.50 .4332 .9332 .0668 .1295 
1.51 .4345 9345 .0655 .1276 
1.52 .4357 .9357 .0643 .1257 
1.53 4370 .9370 .0630 .1238 
1.54 .4382 .9382 .0618 .1219 
1.55 .4394 .9394 .0606 .1200 
1.56 4406 .9406 .0594 .1182 
1.57 A418 -9418 0582 1163 
1.58 4429 .9429 .0571 .1145 
1.59 «4441 .9441 .0559 .1127 
1.60 .4452 .9452 .0548 .1109 
1.61 4463 .9463 .0537 .1092 
1.62 A474 9474 .0526 .1074 
1.63 4484 9484 .0516 .1057 
.1.64 4495 .9495 .0505 .1040 
'1.65 .4505 9505 .0495 .1023 
1.66 A515 -9515 .0485 .1006 
1.67 .4525 .9525 .0475 .0989 
1.68 .4535 .9535 .0465 .0973 
1.69 .4545 .9545 .0455 .0957 
1.70 A554 $9554 .0446 -0940 
L71 4564 9564 .0436 -0925 
1.72 A573 .9573 .0427 .0909 
1.73 .4582 -9582 .0418 .0893 
1.74 A591 -9591 .0409 .0878 


io 


APPENDIX 401 


TABLE III. Areas and Ordinates of the Normal Curve in Terms of x/o—Continued 


a) (2) (3) (4) (5) 
z A B c y 
STANDARD ÅREA FROM AREA IN AREA IN ORDINATE 

S z NAN z LARGER SMALLER z 

CORB MEAN TO Z | Portion Portion 5 

1.75 4599 -9599 .0401 -0863 

1.76 4608 -9608 0392 .0848 

1.77 4616 -9616 -0384 .0833 

| 178 4625 -9625 0375 0818 

179 .4633 .9633 .0367 0804 

1.80 4641 -9641 .0359 .0790 

1.81 4649 -9649 0351 -20775 

1.82 .4656 .9656 .0344 .0761 

1.83 .4664 .9664 .0336 .0748 

1.84 .4671 -9671 .0329 .0734 

1.85 .4678 -9678 .0322 .0721 

1.86 -4686 -9686 0314 .0707 

1.87 4693 -9693 .0307 0694 

1.88 .4699 -9699 .0301 -0681 

i 1.89 4706 -9706 .0294 0669 
i! 

1.90 A713 9713 .0287 0656 

1.91 .4719 .9719 .0281 0644 

1.92 .4726 .9726 0274 .0632 

1.93 .4732 .9732 .0268 -0620 

1.94 4738 .9738 .0262 -0608 

1 1.95 AT44 9744 .0256 .0596 

à 1.96 4750 -9750 0250 .0584 

are 1.97 4756 19756 10244 10573 

1.98 A761 -9761 .0239 0562 

1.99 A767 .9767 .0233 .0551 

2.00 AT72 9772 .0228 .0540 

2.01 ATIS -9778 0222 .0529 

2.02 A783 .9783 .0217 .0519 

2.03 A788 .9788 .0212 .0508 

2.04 .4793 .9793 .0207 .0498 

.4798 .9798 .0202 .0488 

20 -4803 -9803 -0197 10478 

2.07 .4808 .9808 .0192 .0468 

2.08 4812 .9812 0188 0459 

2.09 A817 -9817 .0183 .0449 


402 APPENDIX 


TABLE III. Areas and Ordinates of the Normal Curve in Terms of z/a—Contii nued 


(1) (2) (2) (4) (5) 
z A B c y 
STANDARD AREA FROM AREA IN AREA IN ORDINATE 

T P z LARGER SMALLER x 
Score (2) MEAN To 7 PORTION PORTION aS 
2.10 .4821 .9821 .0179 .0440 
2.11 .4826 .9826 .0174 .0431 
2.12 4830 -9830 .0170 .0422 
2.13 4834 -9834 .0166 .0413 
2.14 4838 -9838 .0162 .0404 
2.15 .4842 .9842 .0158 .0396 
2.16 .4846 -9846 .0154 .0387 
2.17 .4850 -9850 .0150 -0379 
2.18 A854 .9854 -0146 .0371 
2.19 4857 19857 10143 10363 
2.20 .4861 .9861 .0139 .0355 
2.21 .4864 -9864 .0136 .0347 
2.22 .4868 .9868 .0132 .0339. 
2.23 .4871 .9871 .0129 .0332 
2.24 4875 .9875 .0125 .0325 
2.25 4878 .9878 .0122 .0317 
2.26 A881 -9881 0119 .0310 
2.27 .4884 .9884 . .0116 .0303 
2.28 .4887 -9887 0113 0297 
2.29 -4890 -9890 .0110 .0290 
2.30 .4893 .9893 .0107 .0283 
2.31 .4896 .9896 .0104 .0277 
2.32 4898 -9898 .9102 .0270 
2.33 .4901 -9901 .0099 .0264 
2.34 4904 -9904 -0096 .0258 
2.35 -4906 -9906 -0094 0252 
2.36 .4909 -9909 .0091 .0246 
2.37 4911 -9911 -0089 .0241 
2.38 .4913 .9913 .0087 .0235 
2.39 .4916 .9916 .0084 .0229 
2.40 .4918 -9918 .0082 0224. 
2.41 .4920 .9920 .0080 .0219 
2.42 .4922 .9922 -0078 .0213 
2.43 .4925 -9925 .0075 .0208 
2.44 .4927 .9927 .0073 .0203 


] APPENDIX 403 


TABLE III. Areas and Ordinates of the Normal Curve in Terms of z/o—Continued 


(1) (2) (8) (4) (5) 
z A B C y 
STANDARD AREA FROM AREA IN ÀmEA IN ORDINATE 
S E M z LARGER SMALLER z 
core (7 EAN TO 7 PORTION PORTION m 
2.45 .4929 .9929 .0071 .0198 
2.46 A931 .9931 .0069 .0194 
247 4932 .9932 .0068 .0189 
2.48 .4934 .9934 .0066 .0184 
2.49 .4936 .9936 .0064 .0180 
2.50 4938 .9938 .0062 .0175 
2.51 .4940 .9940 .0060 .0171 
2.52 .4941 .9941 .0059 .0167 
2.58 .4943 .9943 .0057 .0163 
Í 2.54 .4945 .9945 .0055 .0158 
2.55 .4946 .9946 .0054 .0154 
2.56 .4948 .9948 .0052 .0151: 
2.57 .4949 .9949 .0051 .0147 ; 
2.58 .4951 .9951 .0049 .0143 
2.59 .4952 .9952 .0048 .0139 
2.60 A953 .9953 -0047 .0136 
2:01 14955 :9955 10045 10132 
2.62 :4956 ‘9956 10044. 10129 
2:63 14957 :9957 10043 (0126 
2:64 14959 :9959 -0041 10122 
2.65 4960 .9960 .0040 0119 
3, 2.06 14961 :9961 -0039 :0116 
2:67 14962 :9962 -0038 0113 
2.68 4963 .9963 .0037 .0110 
2.69 4964 :9964 :0036 ‘0107 
2.70 4965 9965 0035 0104 
271 14966 :9966 {0034 "0101 
2.72 14967 :9967 10033 “0099 
2,73 14968 :9908 .0032 70096 
2.74 .4969 -9969 .0031 .0093 
2.75 4970 .9970 -0030 0091 
2.76 .4971 .9971 .0029 .0088 
2.77 A972 .9972 .0028 “0086 
i 2.78 A973 9973 0027 .0084 
2.79 .4974 .9974 .0026 .0081 


404 . APPENDIX 


TABLE III. Areas and Ordinates of the Normal Curve in Terms of x/o—Continued 


(1) Q) 8) (4) (5) 
z A B e y 
STANDARD AREA FROM AREA IN AREA IN ORDINATE 
x x LARGER SMALLER z 
Score E MEAN TO 7 Portion Portion a 
a 
2.80 .4974 .9974 .0026 .0079 
2.81 .4975 .9975 .0025 .0077 
2.82 .4976 .9976 .0024 .0075 
2.83 A977 .9977 .0023 .0073 
2.84 A977 .9977 .0023 .0071 
2.85 .4978 .9978 .0022 .0069 
2.86 .4979 -9979 .0021 .0067 
2.87 .4979 .9979 .0021 .0065 
2.88 .4980 -9980 -0020 .0063 
2.89 .4981 .9981 .0019 .0061 
2.90 .4981 .9981 .0019 .0060 | 
2.91 4982 -9982 .0018 .0058 
2.92 4982 .9982 .0018 .0056 
2.93 .4983 .9983 .0017 .0055 
2.94 4984 -9984 0016 .0053 
2.95 4984 9984 .0016 .0051 
2.96 .4985 9985 .0015 .0050 
2.97 .4985 .9985 .0015 .0048 
2.98 .4986 -9986 .0014 .0047 
2.99 .4986 -9986 .0014 .0046 
3.00 4987 .9987 .0013 .0044 | 
3.01 4987 .9987 .0013 .0043 k= 
3.02 A987 .9987 -0013 .0042 = 
3.03 4988 .9988 .0012 .0040 
3.04 4988 .9988 .0012 .0039 
3.05 4989 -9989 .0011 .0038 
3.06 .4989 -9989 0011 .0037 
3.07 .4989 .9989 .0011 .0036 
3.08 .4990 .9990 .0010 .0035 
3.09 .4990 .9990 .0010 .0034 
3.10 .4990 -9990 .0010 .0033 
3.11 .4991 .9991 .0009 .0032 
3.12 .4991 -9991 .0009 .0031 
3.13 .4991 .9991 .0009 .0030 
3.14 .4992 .9992 .0008 .0029 


Ye 


S 


APPENDIX - 405 


TABLE III. Areas and Ordinates of the Normal Curve in Terms of z/a—Concluded 


Q ©) 6) e ©) 
2 A B C y 
STANDARD AREA FROM AREA IN AREA IN OnDINATE 
LÀ M x LARGER SMALLER 

Scorn [7 MEAN TO 7 Portion Portion AT. 
3.15 .4992 .9992 .0008 .0028 
3.16 .4992 .9992 .0008 .0027 
3.17 .4992 .9992 .0008 .0026 
3.18 4993 .9993 .0007 .0025 
3.19 .4993 .9993 .0007 .0025 
3.20 .4993 .9993 .0007 .0024 
3.21 4993 .9993 .0007 .0023 
3.22 4994 .9994 .0006 .0022 
3.23 .4994 .9994 .0006 .0022 
8.24 .4994 .9994 .0006 .0021 
3.30 .4995 .9995 .0005 .0017 
3.40 .4997 .9997 .0003 .0012 
3.50 .4998 .9998 .0002 .0009 
3.60 .4998 .9998 .0002 .0006 
3.70 4999 .9999 .0001 .0004 


907 


TABLE IV. Table of x?* 


DEGREES 
.05 402 01 
Pasmpow | P= 98 .98 E Å x 2 4 4 .20 10 0: 

df 

1 .000157 -000628 2.706 | 3.841 

2 .0201 -0404 4.605 | 5.991 

3 415 185 6.251 | 7.815 

4 297 1429 7.779 | 9488 

5 -554 752 9.236 | 11.070 

6 872 1.134 10.645 | 12.592 

7 1.239 1.564 017 | 14.067 

8 1.646 2.032 302 | 15.507 

9 2.088 2.532 14.084 | 10.010 
10 2.558 3.059 15.087 | 18.307 
11 3.053 3.609 10.075 » 
12 3.571 4.178 21.026 y 
13 4.107 4.765 22.362 "u 
14 4.660 5.368 23.685 E 
15 5.220 5.085 d 
16 5.812 6.614 20.465 | 23.542 >| 
17 6.408 7.255 21.615 | 24.709 
18 7.015 7.908 22.760 | 25.989 
19 7.633 8.567 23.900 | 27.204 687 
20 8.260 9.237 25.088 | 25.412 35.020 
21 8.897 9.915 26.171 36.343 
22 9.542 10.600 27.301 37.659 
23 10.196 11.293 28.429 38.008 
24 10.856 11.992 29.553 40.270 
25 11.524 12.697 30.675 41.560 
26 12.198 13.409 31.795 42.856 
27 12.879 14.125 32.912 44.140 
28 13.505 14.847 34.027 45.419 j 
29 14.256 15.574 35.139 46.603 | 40.588 
30 14.953 16.308 36.250 47.962 | 50.892 


* Table IV is reprinted from Table III of Fisher: Statistical Methods for Research Workers, Oliver & Boyd Ltd., Edinburgh, by permission of 


the author and publishers. 
For larger values of df, the expression V2x? — V/2(df) — 1 may be used as a normal deviate with unit standard error. 


APPENDIX 407 


TABLE V. Table of t* 


5 E 3 2 .02 01 
1.000 1.376 1.963 3.078 31.821 
S16 1.061 1.386 1.886 6.965 
765 978 1.250 1.638 4.541 
741 -941 1.190 1.533 3.747 
727 920 1.156 1.476 3.365 
718 -900 1.134 1.440 3.143 
711 .8906 1.119 1.415 2.998 
706 «889 1.108 1.397 2.896 
703 883 1.100 1.383 2.821 
700 $79 1.003 1.372 2.764 
697 .876 1.088 1.303 2.718 
695 .873 1.083 1.356 2.681 
694 .870 1.079 1.350 2.650 
692 .S08 1.076 1.345 2.624 
691 860 1.074 1341 2.602 
690 .865 1.071 1.337 2.583 
659 .863 1.069 1.333 2.507 
688 .862 1.067 1.330 2.552 
688 861 1.066 1.328 2.539 
657 S60 1.004 1.325 2.528 
686 859 1.063 1.323 2.518 
686 858 1.061 1.321 2.508 
685 858 1.060 1.319 2.500 
685 857 1.059 1.318 2.492 
634 856 1.058 1.316 2.485 
684 .856 — 1.058 1.315 2.479 
684 855 1.057 1314 2.473 
683 855 1.056 1.313 2.467 
683 854 1.055 1.311 2.462 
683 S54 1.055 1.310 2.457 


» 19506 .25335 .38532 52440 .07449 .84162 1.03643 1.28155 1.64485 1.95996 2.32634 


Additional Values of t at the 5 and the 1 Per Cent Levels of Significancet 


o I IIIS 

2 df 5% 7 df 5% 1% df 5% 

55 2.005 125 1.979 

34 2082 60 2.000 150 1:976 

36 2.027 65 1.998 175 1.974 

38 2:025 70 1.994 200 1972 

30 2021 75 1.992 300 1:988 

" 80 1.990 400 1.906 

"n 200 sõ 1989 500 1:985 

46 2.012 90 1.987 1000 1.962 

48 2:010 25 n5] s 1.960 
50 2:008 100 : 


" 1. reprinted from Table IV of Fisher: Statistical Methods for Research Workers, Oliver & 
Boyd Lideeigabegeh’ by permission of the author and publishers. iver 
tds atinbul entes were taken from Snedecor: Statistica? Methods, Iowa State College Press, 
Ames, Iows, by permission of the author and publisher. Values for 75, 85, 95, and 175 degrees of freedom 
obtaine, linear interpolation. a hog : 
Were obtained by lilities given are for a two-tailed test of significance. For a one-tailed test of significance, 
the tabled probabilities should be halved. 


408 APPENDIX 


TABLE VI. Values of r at the 5 and the 1 Per Cent Levels of Significance* 


EES OF DEGREES OF 
kc 576 1% | Freevox 5% 1% 
1 997 1.000 24 388 496 
2 950 990 25 381 487 
3 878 959 26 374 478 
4 811 917 27 307 470 
5 754 874 28 361 463 
6 707 834 29 355 456 
7 666 798 30 349 449 
8 632 765 35 325 418 
9 .602 435 40 .304 .393 
10 -576 :708 45 288 372 
11 553 684 50 273 354 
12 532 661 60 250 325 
13 514 641 70 232 302 
14 497 623 80 217 283 
15 482 .606 90 205 267 
16 468 590 100 195 254 
17 456 575 125 174 228 
18 .444 561 150 159 208 
19 433 549 200 138 181 
20 423 537 300 113 148 
21 413 526 400 098 128 
22 404 515 500 088 115 
23 396 505 1000 062 081 


* Table VI is abridged from Table V.A. of Fisher: Statistical Methods for Research 
Workers, Oliver & Boyd Ltd., Edinburgh, by permission of the author and publishers. Addi- 
tional entries were taken from Snedecor: Statistical Methods, Iowa State College Press, Ames, 
Iowa, by permission of the author and publisher. 

. aoe probabilities given are for a two-tailed test of significance, i.e., with the sign of r 
ignored. 


M 


APPENDIX 409 


TABLE VII. Table of z' Values for r* 


4 r z r z # g r P 
000.000 .200  .203  .400  .421  .600  .693 

‘005 00 (205  .208 405 430  .605 701 309 118 
Q0 010 210 213  .410 2430  .610 (709  .810 1.127 
‘Dis 015 215 218 415 442 615 717 815 112 
20 1020 220 221 1420 448 620 725 .820 1157 


.025  .025  .225 .229 .425 .454 .625 T: 25 7 

030 1030 230 231 430 460 -630 Til 39 LISS 
O33 1035 235 239 135 160 620 750 835 1204 
QUO (40 240 245 440 472 (040 — 298 840 1221 
O13 045 245 200 M5 418 645 767  .845 1.238 


.050 .050 250  .2558  .450 485 .650 775 5 5 
055 1055 255 — 201 405 491 -655 15i meo 125 
0060  .000  :200  :20  .400 i97  .000 793 O0 1293 
1065 065 200  .271 -405 504 ‘665  .802 ¿865 1313 
99 1070 270 2:7 410  .510  .070 81l  .8/0 1333 


675 .820 .875 1.354 
-680 .829 .880 1.376 
.685 838 | .885 1.398 
.690 848  .890 1.422 
.695 858 — .895 — 1.447 


-700 867 900 1.472 
105 S77 — .905 — 1.499 
710 887  .910 1.528 
715 897  .915 1.557 
-720 908 .920 1.589 


075 075  .275  .282 475 
080.080  .280  .288  .480 
085 085 .285  .203 485 
(090 .090 .290  .299 490 
(095  .095  .205 304 495 


100 .100 .300  .310 .500 
1105 105 -305  .315 .505 
‘110 .110 -310  .32l .510 
2415 116 -315 -326 .515 
320 121 .320  .332 .520 


25 126 .325  .337 .525 
A30 131 .330  .343 .530 
135 136 .335 -348 .035 
j40  .14l 340 -354 .540 
45 .146 -345 .360 — .545 


150 151 -350  .365 550 -618 
"1155 156 355  .971  .505 G26 984 955 1.886 
09 cier 300 377 560  .033 76 .906 960 1946 
160 167 305  .383 565  .610 705 1008 965 2014 
165 187 -370 358 570  .048 770 1020  .970 2092 


a75 .177 37:5 394 575 — .055 775 1.033  .975 2185 
"180 1182  .380 400 580 ‘662 .780 1.045  .980 2.298 
185 1187 385 «406. «585670 -85 1058 985 2.443 
190 192 390  .412  .590  .678 —.790 "ONE EG 
195 198 .395  .418 595 -685 (795 1.085 1995 2:994 


ructed by F. P. Kilpatrick and D. A. Buchanan from formula (36) 


325 918 925 1 

130 929 930 1653 
135 940  :935 1.697 
740 — .950  .910 1.738 
45  .902 945 1783 


973  .950 1.832 


* Table VII was const 


Orp 


TABLE VIII. The 5 (Roman Type) and 1 (Boldface Type) Per Cent Points for the Distribution of F* 


12 


13 


nı degrees of freedom (for greater mean square) 
à 2 3 4 5 6 ri 8 9 10 1 12 14 16 20 24 30 40 50 75 100 200 500 eo 


161 200 216 225 230 234 237 239 241 242 243 244 245 246 248 249 250 251 252 253 253 254 254 254 
4,052 4,999 5,403 5,625 5,764 5,859 5,928 5,981 6,022 6,056 6,082 6,106 6,142 6,169 6,208 6,234 6,258 6,286 6,302 6,323 6,334 6,352 6,361 6,366 


18.51 19.00 19.16 19.25 19.30 19.33 19.36 10.37 19.38 19.39 19.40 19.41 19.42 19.43 19.44 10.45 19.40 19.47 19.47 10.48 19.49 19.49 10.50 19.50 
98:49 99.00 99:17 99,25 99.30 99.33 99.34 99.36 99.38 99.40 99.41 9942 99.43 99.44 39,45 99.46 99.47 99.48 99.48 99.49 99,49 99.49 99.50 99.50 


10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 8.70 8.74 8.71 8.09 8.00 8.04 8.62 8.00 8.58 8.57 8.50 8.54 8.54 8.5% 
34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 27.13 27.05 26.92 26.83 26.69 26.60 26.50 26.41 26.35 26.27 26.23 26.18 26.14 26.12 


7.71 6.94 0.59 6.39 6.26 6.16 6.09 0.04 6.00 5.96 5.93 5.91 5.87 5.84 5.80 5.77 5.74 5.71 5.70 5.08 5.00 5.05 5.04 5.03 
21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.54 14.45 14.37 14.24 14.15 14,02 13.)3 13.83 13.74 13.69 13.61 13.57 13.52 13.48 13.46 


6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 4.70 4.68 4.64 4.60 4.56 4.53 4.50 4.46 4.44 4.42 4.40 4.38 4.37 4.36 
16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27 10.15 10.05 9.96 9.89 9.77 9.68 9.55 9.47 9.38 9.29 9.24 9.17 9.13 9.07 9.04 9.02 
5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.00 4.03 4.00 3.96 3.92 3.87 3.84 3.81 3.77 3.75 3.72 3.71 3.69 3.68 3.07 
13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 7.72 7.60 7.52 7.39 7.31 7.23 7.14 7.09 7.02 6.99 6.94 6.90 6.88 
5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63 3.60 3.57 3.52 3.49 3.44 3.41 3.38 3.34 3.32 3.29 3.28 3.25 3.24 3.23 
12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84 6.71 6.62 6.54 6.47 6.35 6.27 6.15 6.07 5.98 5.90 5.85 5.78 5.75 5.70 5.67 5.65 
5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34 3.31 3.28 3.23 3.20 3.15 3.12 3.08 3.05 3.03 3.00 2.98 2.96 2.94 2.93 
11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 5.91 5.82 5.74 5.67 5.56 5.48 5.36 5.28 5.20 5.11 5.06 5.00 4.96 4.91 4.88 4.86 
5.12 4.20 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13 3.10 3.07 3.02 2.98 2.93 2.90 2.86 2.82 2.80 2.77 2.76 2.73 2.72 2.71 
10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47 5.35 5.26 5.18 5.11 5.00 4.92 4.80 4.73 4.64 4.56 4.51 4.45 4.41 4.36 4.33 4.31 
4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.97 2.94 2.91 2.86 2.82 2.77 2.74 2.70 2.67 2.04 2.61 2.59 2.56 2.55 2.54 
10.04 7.56 6.55 5.99 5.64 5.39 5.21 5.06 4.95 4.85 4.78 4.71 4.60 4.52 4.41 4.33 4.25 4.17 4.12 4.05 4.01 3.96 3.93 3.91 
SO. 
4.84 3.98 3.59 3,36 3.20 3.09 3.01 2.95 2.90 2.86 2.82 2.79 2.74 2.70 2.65 2.61 2.57 2.53 2.50 2.47 2.45 2.42 2.41 2.40 
9.65 7.20 6.22 5.67 5,32 5.07 4.88 4.74 4.63 4.54 4.46 4.40 4.29 4.21 4.10 4.02 3.94 3.86 3.80 3.74 3.70 3.66 3.62 3.60 
4.75 3.88 3.49 3.26 3.11 3.00 2.92 2.85 2.80 2.76 2.72 2.69 2.64 2.60 2.54 2.50 2.46 2.42 2.40 2.36 2.35 2.32 2.31 2.30 
9.33 6.93 5.95 5.41 5.06 4.82 4.65 4.50 4.39 4.30 4.22 4.16 4.05 3.98 3.86 3.78 3.70 3.61 3.56 3.49 3.46 3.41 3.38 3.36 
4.07 3.80 3.41 3.18 3.02 2.92 2.84 2.77 2.72 2.67 2.63 2.60 2.55 2.51 2.46 2.42 2.38 2.34 2.32 2.98 2.26 2.24 2.92 2.21 
9.07 6.70 5.74 5.20 4.86 4.62 4.44 4.30 4.19 4.10 4.02 3.96 3.85 3.78 3.67 3.59 3.51 3.42 3.37 3.30 3.27 3.21 3.18 3.16 


* Table VIII is reproduced from Snedecor: Statistical Methods, Iowa State College Press, Ames, Iowa, by permission of the author and publisher. 


T SU > E. y 


Y 
TABLE VIII. The 5 (Roman Type) and 1 (Boldface T'ype) Per Cent Points for the Distribution of F*—Continued 


nı degrees of freedom (for greater mean square) 


11 


19 


3.38 2.99 2.76 2.60 2.49 2.41 2.34 
7.77 5.57 4.68 4.18 3.86 3.63 3.46 3.32 


8.68 6.36 5.42 - 


4.54 3.68 3.29 — 


4.24 


na 


15 


25 


2.13 


* Table VIII is reproduced from Snedecor: Statistical Methods, Iowa State College Press, Ames, Iowa, by permission of the author and publisher. 


TABLE VIII. The 6 (Roman Type) and 1 (Boldface Type) Per Cent Points for the Distribution of F*— Continued 


40 


30 


11 


nı degrees of freedom (for greater mean square) 
10 


1.59 


20 2.12 2.08 2.02 1.98 1.9% 


1.82 1.76 1.74 1.69 1.07 1.64 1.01 


2.34 2.25 2. 


1.59 1.56 1.58 
2.00 1.94 1.90 1.87 


1.02 


na 


1.54 1.51 1.48 1.46 
1.90 1.86 1.80 1.76 1.72 


1.57 


Statistical Methods, Iowa State College Press, Ames, Iowa, by permission of the author and publisher. 


* Table VIII is reproduced from Snedecor: 


4i 


Aw 


TABLE VIII. The 5 (Roman Type) and 1 (Boldface Type) Per Cent Points for the Distribution of F*—Concluded 


cS Shee Sm 1-5 
al 3$ «X ass bs $2 SS ES SEO BE SE SS md se 
MM HM dH S4 Md 44 Z4 244 AM dA Hd ee Z4 
Gn pu Rue Be Ne wa xe Re; son ewe Se mer xd 
e| SR 88 «8 88 BS S3 Sz NS 8S AS EX 55 = 
B| 44 de 54 ae RR et B4 seo Z4 44 d4 ne 
OS On wo cM CN Ob Yw HO am co AN OD RH 
e TN wt TO bod TS ecu enis cow at Cue AM AN BA 
S aa Sa aa GA SH Se Se Hd aie Ga ae sae oo 
aN O0 Ow On no aN GO Ow Met NO ON CD WS 
g| 89 BE 9X5 SR SS YS 95 Sa ah as WT AR AS 
3 ae e e nd Ae one H4 n4 M4 B4 48H B4 Zd4 
ne oan C^ Coo hw mo AY OO Bes £m Ab CH Om 
«| 88 88 S& 9S8 SS 9S8 SSSR HR 898 Ss SX EX 
= MM dM Bd one MM dM mM x4 MM dd HH e B4 
X4 pu ne mp ue uo sa ss SY 5 
o| sz ss su 33 Su se se se Ye SS S5 S aq 
d ne dd B4 44d M4 B4 S4 2&4 "MM HH 4d Z4 44 
Ja co meses Se Sut ie Be: (IE Bee man oe 
a| S8 52 SR SR 88 $$ ck Sk SA 8S SS ss 98 
Sl) ae chek EN eid! cid GR GAN ee Dj ies ae deii 
zi 2 HM woo e or tO £u AN COO Sot he OW 
=| 22 88 ga gg Se SX 55 88 do SE ee 5H SS 
S| S da da Ga ghe s ae xd we Be we EUR Eie 
Paksi nm ON oo bo 305 yo or 
aj RS RA RA $8 SS SS SF SF 
g|"| 4d Z4 £d £N Hel £d d4 5d 
H 
3 5 " ema Soi Mus oe 
Bla! 28 28 BR ES Ne BÓ S2 58 
BIS] del Hal £d dei Hei Hel £4 c4 
g 
F1 - e on hw Ha Oa 
Blo] 28 88 4 28 EA ES RA RA 
gj” MN Hd Hw AN AN RA nd nd 
3 oD a NN 0D eN 
Eja] 93 89 S9 23 za ES EN RA 
BJA] Zu aa ed AN AN € N SS 
5 
S co Ot oh Om HS MM 
plal 48 88 85 Sw Sv ay am WW 
B et Ha HN AA nn MN AM na AA 
* 
[7] um On c0 OM co 
j| $8 sg 88 Sh 2a 58 S9 ag 
u|] Ga BN AN AN FA RR 88 cq 
3 
8 n Ont bo ©) As Ob 
Sjo] SE 8S3 SS SS ah ag on es 
EIS| Sa aN da del nal del nd ried 
E 
"a 
e. ste oH RS dius 
zlo| BE 88 S8 SE SS 8$ 95 ak 
aN AN ci AN AG AA cd cd 
Lj LE min Son oo OM nm 
o| 88 58 8g SB SR SE SS SB SY $5 Sh S8 Su 
Ral Nel aid ead cd dw ciel cil cil del diel GN nied 
ne te AD ON DD beo AM YA AS aw 
s| RB SR RR AR ne Ae Us SE Se OE SS Se BÀ 
ci e CIN NO oA WW CI Cid o c cN cid ciel cN 
e (ot. cn OD tn ON tO NM ON zo 
o| SEHR AP AS RS NE "E Uum ma wa Ae US SS 
eed oc c CI aa ci IN A NN IN dN cim dv 
- ma mn Oo Ot RT SOs OS AW aN 
4| $3 8& BS SS 9S8 RA SN AG AT AR AS AS a2 
cie cies c4 CM OI a NED ID OH ci ci ci cred 
rii fex wn uid MES. a E CIL md) as) Dom 
«| 28 33 89 58 SS SR Sh a LE SE SS SX aS 
ei x HS CEP ea aa A OH HS Hx cim cie cis 
soo CV CO ow hs 1500 NM mO O 
o| eg B3 88 B3 RS RS RR SR SH S5 S3 zg SÉ 
ES ONG de dé Ad c4 cid cid ied cic ciel cis e 
Om mo ON bw SH tm ae oa co 
4| 28 55 88 $8 88 AS Se SR Sn Se SS SS SS 
end c có aM CO oA Oc c M o oci 
om Oo to Ow mm OS co HS 
=| 88 83 88 83 GS $8 3e c» FH BE XR BS 3$ 
uM GN GN SN ON HH HS MS HS GS HS HS wo 
3 e e 
s o o o 8 o o Qo «x e 
:|8999 P9853 RRSSESs 
=| 


* Table VIII is reproduced from Snedecor; Statistical Methods, Iowa State College Preas, Ames, Iowa, by permission of the author and publisher, 


Answers to Examples 


Chapter 3 


1. (a) Seven or more jumps; P = .0352 
(b) Ten or more jumps; P = .0191 


2. 


4. 


(a) P = 1/70 = .0143 


(b) P = 16/70 = .2286 
(c) P = 36/70 = .5143 


(d) One wrong; Five or more correct; P = .0400 
8. Eighteen combinations 


(a) P = 1/4 = .25 


(b) P = 12/256 = .0469 
(c) P = 27/64 = .4219 


(d) Two 


` (e) Twenty 
5. Seven or more correct; P = .024 


Chapter 4 


1. z = 1.79; P = .0367 
2. Limits: 30 + 10.41 = 19.59 to 40.41 = 20 to 40 


3. X = 19.52 for z = 1.65. 
Thus, X = 21 or more; P < .05 

Thus 26 or more; P < .05 
Thus 10 or more; P < .05 


lower limit. 
. X = 25.22 for z = 1.65. 


rS 


. X = 9.30 for z = 1.65. 


But X — 20 has 19.5 as a 


x2 = 6.777; P«.01 
13. x? = 2.82; P> .05 
14. z = 29; P = (2)(.0019) = .0038 
15. x2 = 7.80; P « .01 


415 


416 


46 — ANSWERS TO TEAMS ut 


16. 


IT. 


ANSWERS TO EXAMPLES 


(a) P — .057 

(b) P = .016 

(c) P = (2)(.083) = .166 
(a) 20.7562 

(b) 354,715.78 

(c) 25.629 

(d) 160,220.275 


Chapter 5 


1. 


(a) z = 1.882; P > .05; not significant 
(b) z = 2.101; P < .05; significant 


2. z = 2.932; P < .01; significant 

3. x2 = 7.147; P < .01; significant 

4, x2 = 1.14; P is approximately .285 

5. x! = .285; P > .50; not significant 

6. x? = 5.042; P < .05; significant 

7. x? = .842; P > .30; not significant 

8. x? = 28.006; P is much less than .01; significant 
9. x? = 2.700; P is approximately .10; not significant 
10. x? = 9.271; P < 01; significant 

11. (a) x? = 23741 

(b) z = 1.54; P = (2)(.0618) = .1236 
12. By the direct method P = (2)(.0612) = .1224 
13. (a) x2 = 13.1585 
(b) z = 3.63; P = (2)(.00014) = .00028 
14. By the direct method P — (2)(.00010) — .00020 
Chapter 6 

1. x? = 44717; df = 2; P < 01; significant 

2. x2 = 9.838; df = 1; P < .01; significant 

3. x? = 482; df = 1; P >.30; not significant 

4. x? = 3.988; df = 2; P > .10; not significant 
5. x2 = 3.665; df = 2; P > .10; not significant 
6. x? = 11.804; df = 5; P < .05; significant 

7. xi = 12.551; df = 5; P < .05; significant 

8. x? = 8.760; df = 2; P < .02; significant 

9. (a) x? = 5.435; df = 3; P > .10; not significant 


(b) x? = 1.105; df = 3; P > .70; not significant 
9.168; df = 1; P < .01; significant 


— 
Q 

aa 
tad 
ll 


de 


M 


n 
L 


eo 


Chapter 
1 


ANSWERS TO EXAMPLES 417 


. z = 2.268; P < .05; significant 

. z = 4.374; P < .01; significant 

. x2 = 54.576; df = 9; P < 01; significant 
. x? = 23.820; df = 11; P < .02; significant 
. x? = 46.650; df = 1; P < .01; significant 


7 
. Approximately 60 at the 5 per cent level; 100 at the 


1 per cent level 


2. 


Chapter 


1. 


2 
3 
ES 
5 


WAH 


Yes 
. (a) 2 = 876; P = 87; not significant 
(b) r = .247; n = 157; rs = .333 
. (a) 2 2.869; P = .38; not significant 
(b) r = .175; ry = .073; re = .274 
.680; r2 = .902 
Ti .793; r2 = .844 
z = 846; P = .40; not significant 
(a) x? 2.833; df = 2; P > .50; not significant 
(b) r = 456; ri = 304; ra = .585 
. (a) x2 = 1.85; df = 4; P > .70; not significant 
(b) r 2.326; 71 = 81; rs = .457 
. (a) x? = 27.587; df = 12; P < .01; significant 
(b) It would not be legitimate to combine the samples 
. (a) x? = 37.50; df = 14; P « .01; significant 
(b) It would not be legitimate to combine the samples 
. (a) x? = 15.023; df = 4; P < .01; significant 
(b) It would not be legitimate to combine the samples 
.xXp2433;dj-4; P > .30; not significant 
. x2 = 6.77; df = 5; P > .20; not significant 
. 2-080; P = .49; not significant 
381; P = .70; not significant 


a 
Wo 


.2= 


8 

m, = 20.11; mz = 24.69 

: £= 2.80; df= 338; P « .01; significant 
.12 438; df= 38; P < .01; significant 
. i= 2.01; df = 40; P < .01; significant 
.12290; df = 225; P< .01; significant 


418 ANSWERS TO EXAMPLES 


i = .260; df = 40; P > .05; not significant 
. t = 246; df = 48; P < .05; significant 
. t= 111; df = 24; P > .20; not significant 
. (a) Approximately 2,268, if t is taken at the 1 per cent 
level; approximately 1,267, if ¢ is taken at the 5 per cent level 
(b) Approximately 83, if ¢ is taken at the 1 per cent 

level; approximately 45, if t is taken at the 5 per cent level 

10. t = 1.74; df = 38; P > .05; not significant 

11. Approximately 26, if ¢ is taken at the 1 per cent level; 
approximately 14, if tis taken at the 5 per cent level 

12. 1 = 1.81; df = 28; P > .05; not significant 


OOND 


Chapter 9 
1. (a) F 4 3.16; df = 19 and 9; by interpolation into 
the table of F we find that the obtained value is not significant 
at the 5 per cent level ‘ 
(b) t = 1.59; df = 28; P > .10; not significant 
(c) Tabled value for 28 df 
2. (a) F = 414; df = 19 and 19; P < .02; significant 
(b) t = 2.30 
(c) Tabled value for 19 df; t = 2.30 is significant with 


P < 05 
3. (a) F = 3.32; df = 24 and 24; P < .02; significant 
(b) t = 2.29 
(c) Tabled value for 24 df; t = 2.29 is significant with 
P < .05 


4. F = 5.20; df = 22 and 17; P < .02; significant; the 
obtained value of ¢ is 9.60; the approximate value required for 
significance at the 5 per cent level is 2.08 

5. F — 231; df = 69 and 69; P « .02; significant; the 
obtained value of ¢ is 7.79; the approximate value required for 
significance will be the tabled value for 69 df 

6. F = 3.27; df = 45 and 46; P < .02; significant; the 
obtained value of t is 2.72; the approximate value required for 
significance at the 5 per cent level is 2.01 

7. F = 849; df = 107 and 77; P < .02; significant; the 
obtained value of t is 1.59 and does not meet the requirements 
of significance at the 5 per cent level 


D 


ANSWERS TO EXAMPLES 419 


Chapter 10 

1. The mean square between groups (31.67) is smaller 
than the mean square within groups (82.67) and obviously cannot 
be significantly larger; there is no need to calculate F 

2. F = 9.06; df = 1 and 40; P < .01; significant 

3. The mean square befween groups (4.95) is smaller than 
the mean square within groups (8.60) and obviously cannot be 
significantly larger; there is no need to calculate F 

4. The mean square between groups (53.79) is smaller 
than the mean square within groups (93.59) and obviously cannot 
be significantly larger; there is no need to calculate F 

5. The mean square between groups (7.52) is smaller than 
the mean square within groups (24.42) and obviously cannot be 
significantly larger; there is no need to calculate F 

6. F = 6.52; df = 4 and 45; P « .01; significant 

7. (a) F = 11.09; df = 3 and 60; P < .01; significant 

(b) The mean square between groups (5.44) is smaller 
than the mean square within groups (8.50) and obviously cannot 
be significantly larger; there is no need to calculate F 
(c) The analysis of (b) indicates that the means re- 

corded by Operators A, C, and D are homogeneous. Inspection 
of the data also shows that Operator B obtains a mean which is 
lower than the means of the others. Something is apparently 
wrong with the technique of recording used by B 


Chapter 11 
1.322 10; df=2; P is approximately .95; not sig- 
nificant . a f 
2.322248; df = 4; P is approximately .70; not sig- 
nificant 
3. (a) Operator: A B C D 
X 748 311 685 627 
s 3.11 1.24 3.02 2.33 


(b) x2 = 14.93; df = 3; P < .01; significant 


(c) Operator: A B C D 
X 901 585 .865 839 
8 -160 -181 -169 151 


420 ANSWERS TO EXAMPLES 


(d) 
Source of Variation Sum of Squares df 
Between groups 1.1512 3 
Within groups 1.6682 60 
Total 2.8194 63 


Mean Square 


.3837 
.0278 


4. x? = 9.38; df = 3; P < .05; significant 
5. x? = 29.06; df = 2; P < .01; significant 


6. (a) Group: A B 
X 15.30 2.40 
s 26.23 4.27 
(b) 
Source of Variation Sum of Squares df 
Between groups 946.87 2 
Within groups 356.60 27 
Total 1,303.47 29 
(c) Group: A B 
X 3.93 1.57 
s? .39 AT 
(d) 
Source of Variation Sum of Squares df 
Between groups 30.2896 2 
Within groups 13.9160 27 
Total 44.2056 29 
Chapter 12 
2. 
Source of Variation Sum of Squares df 
Type 72,657.20 1 
Background 1,328.60 1 
Time 212,843.82 1 
Type X background 69.72 1 
Type X time 33,434.12 1 
Background X time 2.10 i 
Type Xbackground xtime 351.44 1 


Residual within groups 84,397.00 392 
Total 405,084.00 399 


C 


4.70 
9.12 


Mean Square 


473.44 
13.21 


2.14 


Mean Square 


15.14 
1 


Mean Square 


72,657.20 
1,328.60 
212,843.82 
69.72 
33,434.12 
2.10 
351.44 
215.30 


F 
13.8 


F 
35.84 


29.7 


F 
337.47 

6.17 
988.59 


155.29 
1.63 


ANSWERS TO EXAMPLES 


3. 
Source of Variation Sum of Squares 
Background 16.53 
Type size 1.53 
Background X type size 7.03 
Residual within groups 122.88 
Total 147.97 
4, 
Source of Variation Sum of Squares 
A,B 27.56 
1,2 1.56 
I, I 1.56 
A,B X1,2 9.00 
A,B XI II .25 
L2xLI . 25 
A,BXILIIX1,2 57 
Residual within groups 209.00 
Total 249.75 
5. 
Source of Variation Sum of Squares 
Colleges 90.00 
Tests 360.00 
Colleges X tests 160.00 
Residual within groups 1,224.00 
"Total 1,834.00 


df 
n 


df 
1 
1 
1 
36 
39 


Mean Square F 


16.53 3.77 
1.53 T 
7.03 1.60 
4.39 


Mean Square F 


Mean Square F 


90.00 2.65 
360.00 10.59 
160.00 4.71 

34.00 


6. The table may be set up as indicated below. The column 
headings have been omitted intentionally. You should be able to 
label them from the data of Example 4. Signs have been entered 
in the first column. Complete the entries. 
in the first column could have been taken as minus instead of plus. 
This would change the signs of the sums at the right, but, since they 
are squared, this would have no influence upon the mean 


squares. 


Note that the signs 


420 ANSWERS TO EXAMPLES 


(d) 
Source of Variation Sum of Squares df Mean Square F 
Between groups 1.1512 3 .9837 13.8 
Within groups 1.6682 60 .0278 
'Total 2.8194 63 


4. x2 = 9.38; df = 3; P < .05; significant 
5. x? — 29.06; df — 2; P « .01; significant 


6. (a) Group: A B [6] 
X 15.30 240 4.70 
s 26.23 4.27 9.12 
(b) 
Source of Variation Sum of Squares df Mean Square F 
Between groups 946.87 2 473.44 35.84 
Within groups 356.60 27 13.21 
Total 1,303.47 29 
(c) Group: A B C 
X 3.93 157 2414 
s .39 AT .68 
(d) 
Source of Variation Sum of Squares df MeanSquare F 
Between groups 30.2896 2 15.14 29.7 
Within groups 13.9160 27 E 
Total 44.2056 29 
Chapter 12 
2. 
Source of Variation Sum of Squares df Mean Square F 
Type 72,657.20 1 72,657.20 337.47 
Background 1,328.60 1 1,328.60 6.17 
Time 212,843.82 1 212,843.82 988.59 
Type X background 69.72 1 69.72 ee 
Type X time 33,434.12 1 33,434.12 155.29 
Background X time 2.10 1 2.10 1.63 
Type Xbackground Xtime — 351.44 1 351.44 
Residual within groups 84,397.00 392 215.30 


Total 405,084.00 399 


ANSWERS TO EXAMPLES 421 


3. 
Source of Variation Sum of Squares df Mean Square F 
Background 16.53 1 16.53 3.77 
Type size 1.53 1 1.53 wits 
Background X type size 7.03 1 7.03 1.60 
Residual within groups 122.88 28 4.39 
Total 147.97 31 
4. 
Source of Variation Sum of Squares df Mean Square F 
A,B 27.56 1 27.56 7.39 
1,2 1.56 1 1.56 ake 
L II 1.56 1 1.56 2 
A,BXx1,2 9.00 1 900 241 
A, B XI, II 25 1 25 
Lose t,t 4 25 1 25 
A,BXxLIIX12 57 1 E, 
Residual within groups 209.00 56 3.73 
"Total 249.75 63 
5. 
Source of Variation Sum of Squares df Mean Square F 
Colleges 90.00 1 90.00 2.65 
Tests 360.00 1 360.00 10.59 
Colleges X tests 160.00 1 160.00 4.71 
Residual within groups 1,224.00 36 34.00 
1,834.00 39 


Total 


6. The table may be set up as indicated below. The column 
headings have been omitted intentionally. You should be able to 
label them from the data of Example 4. Signs have been entered 
in the first column. Complete the entries. Note that the signs 
in the first column could have been taken as minus instead of plus. 
This would change the signs of the sums at the right, but, since they 


are squared, this would have no influence upon the mean 


squares. 


422 


ANSWERS TO EXAMPLES 


Sums 56 58 60 61 52 53 41 47 Sum k n 
Background + 42 
Type size + 10 
Illumination F —10 
Background X type T —24 
Background X 
illumination zb 4 
Type Xillumination + 4 
Background x type 
illumination + = 
Chapter 13 
2. 
Source of Variation Sum of Squares df Mean Square 
Type face 1,628.19 3 542.73 
Type size 2,817.44 2 1,408.72 
Type face X type size 1,079.36 6 179.89 
Residual within groups 19,397.20 48 404.11 
Total 24,922.19 59 
3. (a) 


Source of Variation 
S(ex) of subjects 
I(nstructions) 
Barrier) 
E(xperimenter) 

S xI 


Totaly 


Sum of Squares df Mean Square 
689.06 1 689.06 
14.06 rf 14.06 
315.06 1 315.06 
264.06 1 264.06 

1.56 d 

1.56 1 

175.56 1 

14.06 1 

33.06 1 

1.56 1 

14.06 1 

14.06 1 

45.56 1 

22.56 1 

150.06 T 

1,761.90 15 


43.61* 


(Sum)? 
kn 


F 


1.84 
3.49 


F 
15.80 
7.22 
6.06 


ee a 
* For the pooled sums of squares for interactions based upon 11 df which 
js used as the error term. 


+ Because of rounding errors in the calculation of the sums of squares, 


= 


ANSWERS TO EXAMPLES 428 


(b) x? = 10.07; df = 10; P is approximately .50 and 


not significant 
4. (a) x? = 6.11; df = 5; P is approximately .30 and not 


significant 
(b) 


Source of Variation Sum of Squares df Mean Square F 


Between drugs 742.02 i 742.02 35.37 
Concentrations 12,177.30 2 6,088.65 290.21 
Drug X concentration 127.03 2 63.52 3.03 
Residual within groups _ 1,132.90 54 20.98 

Total 14,179.25 59 

6. (a) 

Source of Variation Sum of Squares df Mean Square 
A, B, C 477.78 2 238.89 
1,2 200.00 1 200.00 
X,Y,Z 811.11 2 405.56 
A,B,C X1,2 100.00 2 50.00 
A, B,C XX, Y, Z 7,022.22 4 1,755.56 
12xX,Y,% 833.33 2 416.66 
A, B, CX1,2 xX, Y,Z _ 1,066.67 4 266.67 

Total 10,511.11 17 


(b) xê is 7.356 and is of borderline significance for 3 df. 
Applying the correction factor, we obtain a x? of 6.875 which is 
not significant for 3 df at the 5 per cent value 


Chapter 14 
Te 
Source of Variation Sum of Squares df Mean Square F 
Groups 20.00 2 10.00 22.73 
Levels of ability 128.00 9 14.22 32.32 
Residual (interaction) 8.00 18 44 
Total 156.00 29 


the total obtained by adding the column values, 1,761.90, does not quite equal 


the value obtained by direct calculation, 1,761.94. 


424 
2. (2) 
Source of Variation 
Methods (columns) 
Within columns 
Total 


(b) 
Source of Variation 
Methods (columns) 
Schools (rows) 


Residual (interaction) 


Total 
3. (a) 
Source of Variation 


Between groups 
Between pairs 


Residual (interaction) 


Total 


ANSWERS TO EXAMPLES 


Sum of Squares df 
496.87 4 
1,740.50 25 
2,237.37 29 
Sum of Squares df 
496.87 4 
1,587.77 5 
152.73 20 
2,237.37 29 
Sum of Squares df 
1.80 1 

82.06 9 
3189 
87.04 19 


(b) t = .6/.266 = 2.26; à = 5.1 


4. (a) 
Source of Variation 
Between groups 
Within groups 

Total 


(b) 
Source of Variation 


Between groups 
Between pairs 


Residual (interaction) 


Total 


Sum of Squares 


10.00 
20.00 


30.00 


df 


olor 


Sum of Squares 
10.00 1 

5.00 4 

15.00 4 
30.00 9 


df 


(c) t = 2.0/1.0 = 2.0; Ê = 4.00 
(d) t = 2.0/1.225 = 1.633; 2 = 2.67 


Mean Square 


124.22 
69.62 


Mean Square 
124.22 
317.55 

7.64 


Mean Square 
1.800 
9.118 

353 


Mean Square 


10.00 
2.50 


Mean Square 
10.00 
1.25 
3.75 


5.10 
25.83 


2.67 


A! 


ANSWERS TO EXAMPLES 


425 


5. 

Source of Variation Sum of Squares df Mean Square 
Methods 26.67 2 13.34 
Levels of ability 82.00 29 2.83 
Residual (interaction) 351.33 58 6.06 

Total 460.00 89 
Chapter 15 

1. (a) 

Source of Variation Sum of Squares df Mean Square 
Methods 156.84 2 78.42 
Subjects in same group 2,419.47 12 201.62 

Total between subjects 2,576.31 14 
Time of test 461.37 2 230.68 
Time X methods 23.83 4 5.96 
Pooled subjects Xtime 68. 13 24 2.84 

Total within subjects 553.33 30 

Total 3,129.64 44 

(b) 
Source of Variation Sum of Squares df 
Subjects in same group 2,419.47 12 
68.13 24 


Pooled subjects X time 


Total within cells 2,487.60 36 


2. (a) 
Source of Variation Sum of Squares df Mean Square 
Instructions 510.53 2 255.26 
Subjects in same group 983.60 18 54.64 
Total between subjects 1,494.13 20 
Trials 1,971.23 4 492.81 
Trials x instructions 70.71 8 8.84 
Pooled subjects x trials 515.26 72 7.16 


"Total within subjects 
Total 


F 
2.20 


F 


81.22 
2.10 


F 
4.67 


68.83 
1.23 


426 ANSWERS TO EXAMPLES 


(b) 

Source of Variation Sum of Squares df Mean Square F 
Instructions 66.95 2 33.48 3.24 
Within groups 186.00 18 10.33 

Total 252.95 20 

(e) 

Source of Variation Sum of Squares df Mean Square F 
Instructions 105.53 2 52.76 3.55 
Within groups 267.71 18 14.87 

Total 373.24 20 

3. 

Source of Variation Sum of Squares df Mean Square F 
Attitudes 33.08 2 16.54 1.28 
Subjects in same group 540.90 42 12.88 

Total between subjects 573.98 44 
Time of test 58.94 1 58.94 3.44 
Type of test 61.25 1 61.25 3.58 
Attitude X time 59.34 2 29.67 1.73 
Attitude x type 619.03 2 309.52 18.09 
Type X time 18.05 1 18.05 1.05 
Attitude xtype Xtime 12.24 2 6.12 
Pooled subjects xtests 2,155.90 126 17.11 

Total within subjects 2,984.75 135 

Total 3,558.73 179 

4. 

Source of Variation Sum of Squares df Mean Square F 
Months 248.0 1 248.0 2.21 
Individuals 4,337.0 17 255.1 2.27 
Residual (interaction) 1,907.7 17 112.2 


Total 6,492.7 35 


We 


ANSWERS TO EXAMPLES 


427 


5. 
Source of Variation Sum of Squares df Mean Square F 
Days 130.00 3 43.33 1.38 
Individuals 136.00 4 34.00 1.08 
Residual (interaction) 378.00 12 31.50 
Total 644.00 19 
Chapter 16 
1. (a) 
Source of Variation Sum of Squares df Mean Square F 
Time of testing 11.20 4 2.80 
Subjects 11.20 4 2.80 
Treatments 22.80 4 5.70 
Residual error 146.80 12 12.23 
Total 192.00 24 
(b) Expected values: 
Time of Testing 
Subjects 
. 1 2 3 4 5 
1 3.0 7.0 8.0 58 46 
2 3.8 40 44 5.2 5.6 
3 40 3.0 5.0 2.6 44 
4 40 48 42 5.2 28 
5 2.6 62 64 3.2 4.6 
2. (a) 
Source of Variation Sum of Squares df Mean Square F 
Exposure speed 7.60 4 1.90 ee 
Subjects 26.00 4 6.50 1.36 
Dial type 169.20 4 42.30 8.87 
Residual error 57.20 12 4.77 
Total 260.00 24 
(b) Original variable: pal H O S R V 
X 160 .2 46 22 8.0. 
2 85 2 38 22 80 
VX +5: X 203 .81 2.22 1.57 2.89 
s? .49 .05 .20 .28 .22 


428 ANSWERS TO EXAMPLES 


(c) 

Source of Variation Sum of Squares df Mean Square F 
Exposure speed 71 4 18 ete 
Subjects 1.30 4 232 1.28 
Dial type 11.92 4 2.98 11.92 
Residual error 2.98 12 .25 

Total 16.91 24 

3. E 

Source of Variation Sum of Squares df Mean Square F 
Days .50 3 17 ‘cap 
Dogs 5.69 3 1.90 1.38 
Dosage 5.26 3 1.75 1.27 
Residual error 8.25 6 1.38 

Total 19.70 15 

4. 

Source of Variation Sum of Squares df Mean Square F 
Days .30 3 10 1.0 
Dogs 10.56 3 3.52 35.2 
Dosage 1.61 3 54 5.4 
Residual error 61 6 .10 

Total 13.08 15 


5. (a) F = sı?/s = 1.38/.10 = 13.8. A value of F equal 
to 8.47 is significant at the 2 per cent level (this is a two-tailed 
test) for the 6 and 6 degrees of freedom available. Hence, the 
two mean squares are not homogeneous 

(b) If the dogs have been randomly assigned to the 2 
Latin squares, and if the day variable (time separating the dosages) 
is constant, and if the same dosages have been administered to the 
2 groups of animals, then about the only logical interpretation 
to be placed upon the finding is that the reaction of the dogs to a 
particular dosage is in some way dependent upon prior dosages, or 
that the experimental technique is not reliable 

6. If the 2 residual mean squares had not differed signifi- 
cantly, the analysis might have taken the following form: 


tss 


v— 


ANSWERS TO EXAMPLES 429 


Source of Variation Sum of Squares df 


Days -68 3 
Dogs 18.96 7 
Dosage 5.28 3 
Residual error 10.56 18 
Total 35.48 31 
7. (a) 
Source of Variation Sum of Squares df MeanSquare F 
Rabbits 2,145.00 3 715.00 40.10 
Days 154.50 3 51.50 2.89 
Dosage 968.50 3 322.83 18.11 
Residual error 107.00 6 17.83 
Total 3,375.00 15 
Source of Variation Sum of Squares df Mean Square F 
Rabbits 4,994.19 3 1,664.73 28.96 
Days 120.69 3 40.23 T 
Dosage 477.69 3 159.23 2.77 
Residual error 344.87 6 57.48 
Total 5,937.44 15 
Source of Variation Sum of Squares df Mean Square F 
= Rabbits 646.25 3 215.42 10.86 
Days 217.25 3 72.42 3.65 
Dosage 1,563.25 3 521.08 26.28 
Residual error 119.00 .6 19.83 
Total 2,545.75 15 
Source of Variation Sum of Squares df Mean Square F 
Rabbits 140319 | 3 467.73 2.98 
Days 644.69 3 214.90 1.37 
Dosage 1,158.69 3 38023 2.46 
Residual error 941.37 .$ 156.90 
Total 4,147.94 15 


— 


430 ANSWERS TO EXAMPLES 


(b) x? (corrected) — 8.97; df = 3; P is less than .05 
and the hypothesis of homogeneity is not tenable 


8. 
Source of Variation Sum of Squares df Mean Square F 
Independent observations: 
Order of presentation 60.69 4 15.17 
Residual between indi- 
viduals (error) ^ 2,414.80 20 120.74 


Total between individuals 2,475.49 — 24 


Correlated observations: 


Size of screen 272.61 4 68.15 9.46 
Trials 42.21 4 10.55 1.46 
Residual from Latin 
square (error) 103.58 12 8.63 1.20 
Residual within indi- 
viduals (error) 576.40 80 7.20 
Total within individuals 994.80 100 
Total for experiment 3,470.20 124 
9. 
Source of Variation Sum of Squares df Mean Square F 
Groups (columns) 74.5 3 24.83 Pe "- 
Lists (rows) 359.5 3 119.83 1.19 ey 
Test 4,626.5 3. 1,542.17 15.26 
Residual error 606.5 6 101.08 
Total 5,667.0 15 
. 
Chapter 17 
1. (a) 
Source of Variation Sum of Squares df Mean Square F 
Between groups 3.73 2 1.86 ze 
Within groups 291.47 42 6.94 eee 


Total 29520 44 v 


ANSWERS TO EXAMPLES 431 


| (b) : 
A Source of Variation Sum of Squares df Mean Square F 
à Between groups 100.31 2 50.16 4.11 
Within groups 512.93 42 12.21 
Total 613.24 44 
(o) 


Sum of Squares of 


Source of Variation Errors of Estimate df Mean Square F 
Total 461.85 43 
Within groups 386.02 41 9.42 
Adjusted means 75.83 2 37.92 4.03 
2. (a) 
<8 Source of Variation ha È zy Ey 
Between groups (columns) 3.73 19.07 100.31 
Between matched subjects 77.20 45.40 143.91 
Residual (interaction) 214.27 146.93 369.02 
Total 295.20 211.40 613.24 
Residual + between groups 218.00 166.00 469.33 
(b) 
Sum of Squares of Mean 
Source of Variation Errors of Estimate df Square F 
Residual+ between 
groups 342.93 29 
^ Residual 208.27 27 9.94 
Adjusted means 74.66 2 37.33 3.76 
3. (a) : ; 
Source of Variation SZ E ty yy 
Between groups 56.05 112.32 1,024.33 
461.69 237.16 497.69 


Within groups 
517.74 349.48 1,522.02 


Total 
(b) 
Sum of Squares of Mean 
Source of Variation Errors of Estimate df Square F 
Total 1,286.12 22 
Within groups 375.87 19 19.78 
910.25 3 303.42 15.34 


i Adjusted means 


432 ANSWERS TO EXAMPLES 


4. (a) 
Source of Variation Ee Yay Ly 
Between groups 6.38 22.72 81.00 
Within groups 214.57 16.15 149.00 
Total 220.95 38.87 230.00 
(b) 
Sum of Squares of Mean 
Source of Variation Errors of Estimate df Square F 
Total 223.16 14 
Within groups 147.78 13 11.37 


Adjusted means 75.38 1 15.38 6.63 


AUTHOR 
INDEX 
AND 
SUBJECT 
INDEX 


Author Index 


nn 


Alexander, H. W., 175, 288, 302, 359, 
360 


Bartlett, M. S., 166, 195, 196, 198, 
199, 200, 201, 203, 211, 256, 262, 
317, 330, 359 

Baxter, B., 175, 359 

Benepe, O. J., 166, 359 

Bliss, C. I., 203, 261, 306, 311, 318, 
328, 329, 359 

Bloomers, P., 359 

Brandt, A. E., 105, 229, 359 

Brozek, J., 175, 302, 360 

Buchanan, D. A., 126, 409 

Bugelski, B. R., 309, 310, 360 

Burke, C. J., 95, 364 

Burks, B., 138 

Burt, C., 360 

Butseh, R. L. C., 158, 360 


Child, I. L., 203, 204, 261, 360 

Churchman, C. W., 29, 360 

Clark, M., 173, 360 

Cochran, W. G., 106, 167, 169, 360, 
368 

Cox, G. M., 167, 360, 362 

Crespi, L. P., 203, 360 

Cronbach, L. J., 360 

Crump, S. L., 195, 360 

Crutchfield, R. S., 252, 360 


Dallenbach, K. M., 233, 362 
Daniels, H. E., 195, 360 

De Lury, D. B., 329, 330, 361 
Dressel, P. L., 139, 361 
Duncan, A. J., 366 

Dunlap, J. W., 175, 361, 383 


Edwards, A. L., 90, 114, 246, 299, 
300, 361 


Eisenhart, C., 195, 203, 361 
Englehart, M. D., 361 


Fertig, J. W., 301, 362 

Festinger, L., 167, 362 

Fisher, R. A., 15, 22, 27, 28, 30, 34, 
83, 84, 113, 117, 124, 126, 128, 
136, 137, 152, 163, 165, 166, 169, 
174, 175, 181, 203, 232, 256, 307, 
308, 345, 362, 400, 407, 408 

Foster, W. S., 39, 362 

Fowler, R. G., 94, 365 

Franklin, M., 94, 365 


Garner, F. H., 362 

Garrett, H. E., 175, 303, 305, 362, 367 
Gaskill, H. V., 362 

Gilliland, A. R., 288, 362 

Glanville, A. D., 233, 362 

Goulden, C. H., 67, 256, 362 
Grafton, J. B., 140, 362 

Graham, F. K., 172, 362 

Grant, D. A., 175, 303, 309, 363 


Hammond, K. R., 172, 363 
Hartman, G., 118, 363 

Hellman, M., 92, 363 

Hey, B. B., 166, 363 

Hill, J., 102, 365 

Hoel, P. G., 46, 53, 164, 228, 363 
Horst, P., 246, 361 

Horton, G. P., 108, 319, 368 
Horton, H. B., 363 

Humphreys, D. W., 288, 362 
Humphreys, L. G., 288, 319, 303 


Irwin, J. O., 363 


Jackson, R. W. B., 175, 363 
Jellinek, E. M., 364 


435 


436 AUTHOR INDEX 


Johnson, P. O., 364 


Kempthorne, O., 134, 364 
Kendall, B. S., 172, 362 
Kendall, M. G., 19, 21, 22, 53, 59,117, 

180, 308, 364, 368, 378 
Kendler, H. H., 171, 364 
Kennedy, J. L., 38, 364 
Kilpatrick, F. P., 126, 409 
Kogan, L. S., 288, 364 
Kreezer, G. L., 233, 362 
Kuenne, M. R., 95, 364 
Kurtz, A. K., 383 


Leahy, A. M., 138, 364 
Lewis, D., 95, 170, 364 
Lewis, H. B., 94, 365 
Lindquist, E. F., 175, 288, 359, 
365 
` Long, L., 102, 365 
Loucks, R. B., 193, 365 


McNemar, Q., 88, 89, 365 
Maier, N. R. F., 114, 365 
Marks, E. S., 365 

Mather, K., 105, 365 

Merrill, M. A., 16, 367 
Merritt, C. B., 94, 365 

Moore, K., 172, 365 

Morgan, J. J. B., 158, 191, 365 
Mowrer, O. H., 357, 365 


Newcomb, T. M., 206, 366 
Nisbet, S. D., 332, 366 


Pearson, E. S., 166, 366 
Pearson, K., 84, 366 
Peatman, J. G., 22, 366 
Peters, C. C., 280, 366 


Rose, C. L., 306, 311, 318, 328, 329, 
359 
Rosenzweig, S., 116, 366 


Sanders, H. G., 362 

Schafer, R., 22, 366 

Shen, E., 175, 366 

Sherman, M., 10, 366 

Simpson, T. W., 366 

Sleight, R. B., 200, 204, 328, 366 

Smith, B. B., 19, 21, 22, 117, 364, 378 

Smith, J. G., 366 

Snedecor, G. W., 105, 113, 134, 139, 
163, 167, 181, 182, 203, 204, 228, 
366, 407, 408, 410 

Spence, K. W., 11, 366, 367 


Taylor, W. S., 236, 367 
Terman, L. M., 16, 367 
Thomson, G. H., 332, 367 
Tippett, L. H. C., 22, 367 
Tolman, E. C., 360 
Trealoar, A. E., 367 
Tsao, F., 364 


Uspensky, J. V., 35, 367 
Vinacke, W. E., 303, 306, 318, 367 


Walker, H. M., 145, 367 
Wishart, J., 174, 256, 367 
Worcester, D. A., 173, 360 
Wyatt, R. F., 357, 358, 307 


Yates, F., 22, 83, 92, 99, 169, 203, 
229, 232, 256, 307, 362, 367, 308 

Young, D. M., 330 

Yule, G. U., 21, 22, 53, 59, 180, 368 


Zubin, J., 175, 303, 305, 362, 367 


CUPS 


Subject Index 


nl 


Analysis of covariance, 333-355 
and adjusted sum of squares be- 
tween groups, 345-356 
applications of, 335, 355 
and correlation within groups, 346- 
E 348 
efficiency of, 347-348 
error mean square in, 346-348 
and matched groups, 349-355 
sums of squares of errors of estimate 
in, 340-341 
tests of significance in, 346 
Analysis of variance, 174-190, 208- 
260, 264-280, 284-297, 303-327 
angular transformations in, 202- 
203 
applications of, in psychological re- 
search, 175, 185, 200-201, 209, 
232-233, 237, 247-255, 264- 
269, 273-274, 278-280, 286- 
296, 303-306, 315-318, 319-327 
applied to several independent 
samples, 186-189 
and degrees of freedom, 179-180, 
190, 212, 215-216, 225, 237, 
247, 253, 286-288, 296-297, 
304-305, 316 
and F test, 181-182 
factorial design in, 208-232, 237- 
260, 305, 317-318 
and homogeneity of the error 
variance, 195-204, 211-212, 
316-317 
and independence of mean squares, 
180-181 
and Latin square design, 303-327 
logarithmic transformations in, 
202-203 
and matched groups design, 264— 
280 


Analysis of variance (Cont.) 
mean square within groups in, 177- 
179 
and null hypothesis, 183-184 
partitioning of the total sum of 
squares in, 176-177 
and reciprocal transformation, 203 
relation to t test in the case of two 
independent groups, 182-184 
in the case of two matched 
groups, 276-278 
and repeated observations on the 
same subjects, 284-297 
on several groups of independent, 
subjects, 288-296 
and residual variance, 181 
of a series of independent Latin 
squares, 315-317 
and square root transformation 
199-200 
and sum of squares between groups, 
177 
within groups, 177 
summary of calculations for simple 
applications of, 189-190 
and test of significance, 181-182 
total sum of squares in, 177 
transformations of scale in, 181, 
198-199 
for two groups, 182-184 
Angular transformations, 202-203 
Area, under normal curve, 54, 58 


Between groups mean square, 179- 
180 
Between groups sum of squares, 177 
Bias, 157 
in assigning subjects to experimen- 
tal conditions, 20-21 
jn averaging z' values, 136-138 


437 


438 


SUBJECT INDEX 


498—000 uL D MM 


Bias (Cont.) 
in sampling, 18 
Binomial distribution, 52-53, 142-143 
discrete nature of, 56 
mean of, 57 
parameters of, 57-58 
relation to normal, 56-57 
standard deviation of, 57 
Binomial expansion, 43-45, 62 
Binomial probabilities, 43-45 
approximation of, from table of 
normal curve, 51, 58-60 
calculation of, with successive 
multipliers, 46 


Chi-square, 63-68, 80-83, 85-86, 89- 
91, 97-114 
applied to correlated proportions, 
89 
uncorrelated proportions, 80-82 
and binomial probabilities, 63-66 
calculated from aj X 2 table, 104- 
106 
calculation of, from 2 X 2 table, 
80-82, 85-86 
and correction for continuity, 61, 
67, 83, 86, 90-91, 109 
degrees of freedom for, 63-64, 66, 
81-82, 98-99, 104, 113-114, 
196-197 
distribution of, 63-66, 98, 134 
with more than 80 degrees of free- 
dom, 113-114 
relation to z, 66-67 
test for correlation coefficients, 135— 
136 
for homogeneity of variance, 
195-198 
and tests of technique, 108-113 
use of table of, 64 
Coefficient of variation, 154-155 
influence of changes in, on the £ 
test, 156-157 
Combinations, 35-37, 208 
Comparisons, planning of, 101, 106, 
108 


Continuous variables, 1-2, 6-7, 143 
Control group, 264, 285 
Correction for continuity, 56, 61, 64, 
67, 83, 86, 90-91, 99, 109 
correlated proportions, 90-91 
uncorrelated proportions, 82-83 
in using x? to evaluate binomial 
probabilities, 64 
in using table of normal curve te 
evaluate binomial probabili- 
ties, 56 E 
Correlation, influence of, on test of 
significance, 91, 270-273, 277- 
278 
between initial and final perform- 
ance of subjects, 265 
and Latin square design, 323-326 
between means, 277 
and variances, 201-202 
between proportions, 86-87 
and regression, 336-338 
between standard deviations and 
means, 202-203 
and standard error of difference 
between means of matched 
groups, 277-278 
between subjects in matched groups 
design, 270-273 
use of normal curve to test hy- 
pothesis of zero, 122-123 
use of t to test hypothesis of zero, 
123-125 
between variances and means, 199 
within groups, 346-348 
Correlation coefficient, 120-138 
average value of, obtained from 
several samples, 133-134 
in two samples, 132-133 
calculated from a 2 X 2 table, 87- 
88 
fiducial limits for, 128-131, 136 
relation of sample size to signifi- 
cance of, 125-126 
reliability of, and sample size, 130— 
131 
sampling distribution of, 121-123 


jit. 


A 


nt 


A 


SUBJECT INDEX 


Correlation coefficient (Cont.) 
and testing of hypotheses, 120-138 
transformations of, 123, 126-128 
use of table of significant values of, 
125-126 
and 7’ transformation, 126-128 
Covariance, 285, 338 
analysis of, 333-355 
Cross products, sum of, 338-340 


Degrees of freedom, 63, 66, 81-82, 98- 
100, 104, 106, 113-114, 123-124, 
144—145, 151, 155, 164-165, 170, 
179-180, 190, 196-197, 215-216, 
223, 277, 304-305, 337-338 
and analysis of variance, 179, 190 
for x?, 63-64, 81-82, 98-99, 104, 
113-114, 196-197 
and F, 164-165 
in the factorial design, 215-216 
for interaction, 216, 223 
in Latin square design, 304-305 
and mean square between groups, 
179-180 
within groups, 179 
for r X c table, 104 
and regression coefficient, 337-338 
and t, 123-124, 144-145 
for £ test of correlation, 124 
of matched groups, 277 
for test of significance of difference 
between means, 151 
Dependent variables, 13 
Diagonal square, 308-309 
Discrete variables, 2-8, 97 
Distribution, binomial, 52-53 
of x2, 63-66, 98, 134 
of correlation coefficient, 121-123 
cumulative percentage, 199 
of difference between means of 
samples, 149-151 
between proportions, 75-76 
of F, 163-164 


frequency, 19 
normal, 53-56, 126-128, 143, 199 


of sample means, 17-20, 142-143 


439 


Distribution (Cont.) 
sampling, 17-20 
skewed, 121, 123, 142-148, 166- 
167, 199 
of standard deviation, 17-20 
of t, 143-144 
theoretical frequeney, 52-53 
of variance, 17-20 
of z', 126-128 
Distribution functions, 53 


Efficiency, of experimental designs, 
334 
of Latin square design, 312 
of matched groups design, 270-273, 
211-218 
Enumeration data, 2 
Error, of estimate, 337, 340-341 
estimates of, 26, 180-181, 247-252, 
268-270, 287, 294-296, 305, 
317, 325-326, 346-348 
interactions as estimates of, 247- 
252, 270 
kinds of, in testing hypotheses, 29- 
30 
jn test as a result of nonnormality, 
165-167 
Error mean square, in the analysis of 
covariance, 346-348 
in analysis of variance, 180-181 
in designs involving repeated meas- 
urements of the same subjects, 
287 
involving repeated measurements 
on several independent groups, 
294-296 
in experiments involving matched 
groups, 268-269 
in Latin square design, 305, 317, 
323, 325-326 
Error mean squares, pooling of, from 
Latin squares, 315-317, 325-326 
significant differences in, 198-199 
test for homogeneity of, 195-198 
Error variance, 180-181 
and replication, 216-217 


440 


SUBJECT INDEX 


Estimate, of error, 26, 180-181, 247— 
252, 268-270, 287, 294-296, 305, 
317, 323, 325-326, 346-348 

of a parameter based upon pooled 
frequencies, 76-77 
of population correlation obtained 
from several samples, 132-134 
of population standard deviation 
based upon pooled data, 150- 
151 
of population values, 18 
of population variance in the 
analysis of variance, 179-180 
based upon pooled data, 150 
of size of samples for repetition of 
and experiment, 153-155 
Experimental controls, 37-39, 48, 
108, 113, 182 
Experimental design, 1, 12, 20-22, 26, 
34-35, 42, 73, 175, 232-233, 264— 
266, 270-273, 275, 280, 284-285, 
288-295, 315-317, 319-327, 333- 
334, 346-348, 355 
and analysis of covariance, 333-355 
of variance, 175 
efficiency of matched groups, 270- 
273, 275-278 
importance of considering inter- 
actions in, 278-280 
involving a difference between fre- 
quencies or proportions, 73 
involving matched groups, 264— 
280 
involving repeated measurements 
of same subjects, 284-297 
on several independent groups, 
288-295 
involving replication of the same 
Latin square, 319-327 
the Latin square, 303-327 
and matching variables, 273-274 
and randomization, 20-22 
and replication with a series of in- 
dependent Latin squares, 315- 
317 
sensitivity of, 42 


Experimental error, 180-181, 247- 
252, 268-270, 287, 294-296, 305, 
317, 325-326, 346-348, 353 

Experimental group, 264, 285 

Experimental technique, 108-113, 

317 


F, in analysis of variance, 181-182 
distribution of, 163-164 
reciprocal of, 228 
relation of to t, 184, 277 
use of table of, 164-165 
Factorial design, 12, 208-232, 237- 
260, 305, 317-318 
advantages of, 232-233 
complex, 237-260 
higher-order interactions in, 237 
in Latin square, 317-318 
method for showing comparisons 
in, 229-232 
and orthogonal comparisons, 232 
test for homogeneity of variance 
in, 211-212, 220-221 
without replication, 252-254, 256- 
257 
Factorials, 35 
Fiducial limits, 128-131 
for correlation coefficient, 128-131, 
136 
for difference between means, 152- 
153 
for mean, 144-146 
and sample size, 130-131, 136, 147 
Fiducial probability, 147-148 
First-order interactions, 223 
Fourfold point coefficient, 88 
Frequencies, exact nature of, 2 
pooling of, 76-77 
and probability, 27 
Frequency distributions, 19 


Homogeneity of variance, 152, 164— 
165, 167—169, 195-204, 211-212, 
315-317 

and angular transformation, 202- 
203 


[m 
= 


SUBJECT INDEX 441 


Homogeneity of variance (Cont.) 

and factorial design, 211-212, 220- 
221 

and Latin squares, 315-317 

and logarithmie transformation, 
202-203 

and reciprocal transformation, 203 

and square root transformation, 
199-202 

and £ test, 152, 165, 167-169 

test of hypothesis of, 164-165, 195- 
198 

and transformations of scale, 198- 
204 

Hypothesis, acceptance of false, 29- 
30 

of homogeneity of variance, 162- 
165, 195-198 

meaning of statistical test of, 27-28 

null, 24, 27, 35, 75-77, 79, 89, 151- 
152, 183-184 

rojection of true, 29-30 

source of, from data, 101 

tested by F, 181-182 

by t, 144-146, 151-152 
of a uniform distribution, 98-102 
of zero correlation, 122-125 


Independent variables, 13 
Interactions, assumptions involved in 
pooling, 254-256 
calculation of higher-order, 242-246 
degrees of freedom for, 216, 223 
in designs involving repeated meas- 
urements of same subjects, 
287-288 
as estimates of 
first-order, 223 
importance of considering, 278-280 
interpretation of, 212, 217-219, 
227-228, 248 
involving subjects, 29 1-294 
in matched groups design, 278-280 
with one degree of freedom, 214— 
215 
pooling of, 250, 254-256 


error, 247-252, 270 


Interactions (Cont.) 
second-order, 225, 227-228, 242- 
246 
simple, 223 
testing for homogeneity of, 258- 
259 
testing significance of, and replica- 
tion, 259-260 
Interpolation, 59, 65 
Intersubject variation, 201 
Intervals, corresponding to discrete 
categories, 5-7 
Intrasubject variation, 291 
Inverse probability, 148 
Items, quantification of responses to, 
6-7 


j X 2 table, and x? test, 104-106 


Latin squares, 303-327 
analysis of variance of, 310-312 
applications of, 315-317 
correlated observations in, 323-326 
efficiency of, 312 
and factorial designs, 317-318 
and homogeneity of variance, 315- 
317 
and independent observations, 320— 
323 
and methods experiments, 327 
replication of independent, 315-317 
of the same, 319-327 
residual mean square in, 305 
systematie, 308-310 
Levels of significance, 78-79, 163-164 
Logarithmic transformation, 202-203 


Matched groups, and experimental 
design, 264-280 
and ( test, 276-278 
Matching variables, 273-274 
Mean, arithmetie, 17 
of binomial distribution, 57 
fiducial limits for, 144-146 
sample size and reliability of, 142, 
147 


442 


Mean (Cont.) 
sampling distribution of, 17-20, 
142-143, 147 
of sampling distributions, 18 
standard error of, 19, 142, 178 
as an unbiased estimate, 18 
variance of, 178 
Mean gains, standard error of dif- 
ference between, 285 
Means, correlated with standard de- 
viations, 202-203 
with variances, 199 
difference between, 23-26 
fiducial limits for difference be- 
tween, 152-153 
sampling distribution of difference 
between, 149 
significance of difference between 
and heterogeneity of variance, 
167-169 
standard error for difference be- 
tween independent, 150-151 
of difference between for matched 
groups, 276-277 
Mean squares, between groups, 179- 
180 
independence of, in the analysis of 
variance, 180-181, 199 
significant differences in error, 198— 
199 
test of significance of, 181-182 
testing for homogeneity of, 195-198 
testing whether one is significantly 
smaller than another, 228 
within groups, 177-179 
Measurements, approximate nature 
of, 1-2 
in psychology, 3-9 
reliability of, in relation to random 
variation, 26 
repeated on same subjects, 284-297 
Methods experiments, 288 
and Latin square design, 327 


Nonnormality, and £ test, 165-167 
and transformations of scale, 199 


SUBJECT INDEX 


Normal curve, approximation of 
binomial probabilities from, 58- 
60 
area under, 54-58 
and distribution of z’, 126-128 
equation of, 53 
ordinate of, 54 
parameters of, 53 
and test of hypothesis of zero cor- 
relation, 122-123 
use of tables of, 51, 54, 78, 113-114, 
143 
Normal distribution, 53-56 
relation to binomial, 56-57 
Normal probability paper, 199 
Null hypothesis, 24, 27, 35, 75-77, 79, 
89, 151-152, 183-184 
and analysis of variance, 183- 
184 
and difference between means, 151— 
152 
between proportions or 
quencies, 75-76 


fre- 


Observations, continuous, 1 

correlation between, 86-90, 120- 

138, 296-297 

discrete, 2 

qualitative, 2 

quantitative, 1 
One-tailed tests, 78-79, 132, 182 
Ordinate, of normal curve, 54 
Organismic variables, 1, 7-9 
Orthogonal comparisons, 232 


Parameters, 18-19 
of binomial distribution, 57-58 
of normal curve, 53 
Percentages, angular transformation 
for, 202-203 
Permutations, 35-37, 208 
Phi coefficient, 88 
Polls, public opinion, 16 
Pooling, of data to arrive at a com- I 
mon estimate of the population 
Standard deviation, 150 


SUBJECT INDEX 


443 


Pooling (Cont.) 
of estimates of error of Latin 
squares, 315-317, 325-326 
of frequencies, 76-77 
of interactions, 254—256 
Populations, 15 
estimate of mean of, 17-18, 144— 
147 
of variance of, 17-18, 150, 163- 
165, 177-180, 195-198 
Practice effects, testing significance 
of, 286-288 
Prediction, in psychology, 9-13 
Probabilities, approximation of bi- 
nomial, 51, 58-60 
exact method for, in 2 X 2 table, 
84-85 
obtained from binomial, 43-45 
Probability, 20, 27-28 
fiducial, 147-148 
and independent events, 43-45 
inverse, 148 
Proportions, correction for continuity 
for, 67-08, 82-83, 90-91 
null hypothesis concerning dif- 
ference between, 75-76 
sampling distribution of difference 
between uncorrelated, 76-77 
significance of difference between 
correlated, 86-90 
between uncorrelated, 80-82 
standard error of, 58 
standard error of difference be- 
tween correlated, 87-89 
between uncorrelated, 76-77 
Psychology, nature of research in, 1, 


9-12 


Quantitative variables, 1-2 
Qualitative variables, 2-3 
Randomization, and experimental de- 
sign, 20-22 
and Latin square design, 306-310 
and uncontrolled sources of varia- 
tion, 26 


Random numbers, table of, 20, 22-23, 
34 
Random selection, 19-20 
Random variation, 26 
Ratings, 5-6 
Reciprocal transformations, 203 
Regression coefficient, 336, 340-341 
Regression, and correlation, 336— 
338 
Regression equation, 336 
Relative deviate, 58, 75 
Reliability, of correlation coefficient 
and sample size, 130-131 
of mean and sample size, 142, 147 
of measurements in relation to 
random variation, 26 
Repeated measurements, on same 
subjects, 284-297 
Replication, 26, 30, 269 
absence of factorial design, 252- 
254, 256-257 
and error variance, 216-217 
and factorial designs, 252-254 
of independent Latin squares, 315- 
317 
in relation to standard errors, 26 
of same Latin square, 319-327 
and testing significance of inter- 
actions, 259-260 
Research, in psychology, 1, 9-12 
samples in, 15-17 
Residual mean square, 
square design, 305 
Residual sum of squares, in Latin 
square design, 312-315 
Residual variance, 181 
in matched groups design, 269-270 
Response variables, 1, 4-7 


in Latin 


Samples, biased, 18 
checking representativeness of, 16- 
17 
difference between variances in re- 
lation to size of, 162-163 
distribution of correlation coeffi- 
cient in relation to size of, 121 


444 SUBJECT INDEX 
Samples (Cont.) Skewness (Cont.) 


of standard deviation in relation 
to size of, 144 
estimate of size of, 153-155 
random selection of, 19-90 
reliability of correlation coefficient 
in relation to size of, 130- 
131 
in research, 15-17 
size of, in relation to Significance of 
correlation, 125-126 
size of, and reliability of mean, 142, 
147 
unbiased methods of selecting, 18 
Sampling distribution, 17-20, 54 
of correlation coefficient, 121—123 
of difference between means, 24, 
150-151 
between proportions 
cies, 75-76 
of mean, 17-20, 142-143, 147 
Scores, 6-7 
Second-order interactions, 225 
calculation of, ; 242-246 
interpretation of, 227-228 
Significance, practical versus statisti- 
cal, 30-31 
Significance levels, 78-79, 163-164 
Significance points, 78-79 
Significance Standards, 28 
Significance tests, 27-28 
in analysis of variance, 181-182 
for correlation coefficients, 199. 
132, 135-136 
of difference between means, 151- 
152 


between proportions, 80-82, 86- 
90 


or frequen- 


influence of correlation on, 91 
involving one and two tails of a 
normal distribution, 77-79 
Simple interactions, 223 
Skewness, of sampling distribution of 
correlation coefficient, 121-123 
and sampling distribution of mean, 
142-143 


and ¢ test, 166-167 
and transformations of scale, 199 
Square root transf ormations, 199-205 
Standard deviation, of binomial dis. 
tribution, 57 
correlation of, 
203 
distribution of, 17-20 
and sample size, 144 
Standard error, 19 
of correlation coefficient, 121 
of difference between correlated 
proportions, 87-89 
between mean gains, 285 
between means, 150-15] 
between means of matched 
Eroups, 276-277 
between means and sample size, 
153-155 
between uncorrelated propor- 
tions, 76-77 
between z' values, 131-132 
of estimate, 337 
of the mean, 19, 142, 147, 178 
ofa Proportion, 58 
relation of, to replication, 26 
of 2’, 198 
Statistical Significance, 28, 30-31 
Statistics, nature of, 18-19 


with means, 202- 


: assignment, of, in 
experiments, 20-22 


Um of eros, produets, between 
groups, 339 


total, 339 ] 


within groups, 339 
Sum of Squares, between Eroups, 177 


of errors of estimate, 337, 340- 
341 


for interaction, 214-215 
total, 177 


Within groups, 177 i 
Systematic errors, and Latin square | 
design, 308-310 


SUBJECT INDEX 


t, and changes in coefficient of varia- 
tion, 156-157 
distribution of, 143-144 
and equality of variances, 152 
and estimation of sample size, 153- 
155 
and heterogeneity of variance, 165, 
167-169 
influence of nonnormality on, 165- 
167 
relation of, to F, 184, 277 
test applied to matched groups, 
276-278 
test for difference between means 
with equal n’s and heterogene- 
ous variance, 170 
with unequal n’s and hetero- 
geneity of variance, 167-169 
test of difference between inde- 
pendent means, 150-152 
between means with hetero- 
geneity of variance, 167-169 
test for hypothesis of zero correla- 
tion, 123-125 
use of table of, 123-124, 144 


Technique, test of, 108-113 
Test scores, 6-7 
Test of technique, 108-113 
Tests of hypotheses, nature of, 27-28 
Tests of significance, 27-28, 77-79, 
80-82, 86-91, 123-125, 131-132, 
151-155, 164-165, 167-169, 181- 
182, 268-269, 286-287, 294-296, 
305 
in analysis of variance, 181-182 
and coefficient of variation, 153— 
155 
designs involving 
measurements of the same 
subjeets, 286-287, 204-296 
difference between correlated 
proportions, 86-90 
between correlation coefficients, 
131-132 
between means, 151-152 


for repeated 


for 


445 


Tests of significance (Cont.) 
of difference between means with 
heterogeneous variance, 167- 
169 
between uncorrelated 
tions, 80-82 
in experiments involving matched 
groups, 268-269 
for homogeneity of variance, 164— 
165, 195-198 
for hypothesis of zero correlation, 
123-125 
influence of correlation on, 91 
in Latin square design, 305 
one-tailed, 78-79, 132, 182 
two-tailed, 78-79, 132, 182 
Total sum of squares, in the analysis 
of variance, 176-177 
Transformations, 198-204 
angular, 202-203 
for correlation coefficient, 123 
logarithmic, 202-203 
and nonnormality, 199 
reciprocal, 203 
and skewness, 199 
square root, 199-202 
Triple interactions, 225 
Two-tailed test, 78-79, 132, 182 


propor- 


Unbiased estimates, 17-20, 157 


Variables, continuous, 1-2, 6-7, 143 
dependent, 13 
discrete, 2-3, 97 
independent, 13 
matching, 273-274 
organismie, 1, 7-9 
in psychological research, 1, 9-12 
qualitative, 2-3 
quantitative, 1-2 
response, 1, 4-7 
stimulus, 1, 3-4 
Variance, 17-18, 196 
analysis of, 174—190, 208-260, 264— 
280, 281-297, 303-327 
correlation of, with means, 199 


Variance (Cont.) 


equality of, and ¢ test of signifi- 


cance, 152 
error, 180-181, 216-217 
estimation of components of, 195 
estimation of population, 17-18, 
150, 163-165, 177-180, 195-198 
heterogeneity of, and t test, 169- 
170 
and transformations of scale, 
198-204 
and square root transformation, 
199-202 
and ¢ test, 162-163 
of means, 178 
residual, 181 
residual, in Latin Square design, 
305 


SUBJECT INDEX 


Variance (Cont.) 


in matched groups design, 269- 
270 | 


Significance of difference between, 
and sample size, 162-163 | 
stabilizing, 199 er 


test of homogeneity of, 164-165, | 
195-198 
Variance ratio, 163 
Variation, coefficient of, 154-155 
uncontrolled, estimates of, 180-181 


Within-groups mean 
179 * 
Within-groups sum of squares, 177 


Square, 177- 


| 


af transformation for correlation co- | 
efficient, 126-138 i 


